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This is an analysis of studies done 
in the years 1920-1957 which con- 
trast the quality of performance by 
individuals and by groups in diverse 
situations. A number of studies are 
included which add to our under- 
standing of this aspect of human be- 
havior. However, an unpublished re- 
view by Lorge et al. (37) prepared in 
1953 served as an important source 
for this presentation. In fact, some of 
the organization of that report is car- 
ried into this study. The existence of 
this report is due to a literature 
search made in connection with re- 
search into group performance and 
group process in problem solving. 

It is important to focus on basic 
concepts such as “group,” ‘‘task,” 
and “‘criterion.”” These terms have 
been applied in so many different 
senses and in so many different situa- 
tions that clarification and differenti- 
ation are necessary for the interpreta- 
tion of the research results. 

The most ambiguous term seems to 
be that of “‘group,”’ which not only 
is recognized in a variety of senses 
by lexicographers, but also is used 
with a wide range of meanings by 
social psychologists. The lexicog- 
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rapher considers a group as: (@) an 
assemblage of persons in physical 
proximity considered as a collective 
unity, e.g., a group by aggregation; 
and (6) a unity of a number of per- 
sons classed together because of any 
kind of common relation, whether of 
organization or of commitment. The 
social psychologist recognizes three 
kinds of groups: (a) an assemblage of 
persons in a physical environment; 
(b) an association of persons with 
some form of social, political, or man- 
agerial organization; and (c) a collec- 
tive unity of members subscribing to 
a common symbol or loyalty. Sapir, 
in the Encyclopedia of the Social Sci- 
ences, distinguishes three classes of 
groups: (a) persons at a football game 
or in a train; (6) organizationally 
defined, as having some mutuality of 
purpose, e.g., employees in a factory 
or pupils in a classroom; and (c) sym- 
bolically defined, as serving some 
well-recognized function or func- 
tions, e.g., family, military staff, or 
executive cabinet. 

A military staff or an executive 
cabinet, however, achieves its collec- 
tive unity as a consequence of having 
interacted with one another over a 
considerable period of time, so that 
they have developed a tradition of 
working together for mutual and 
common purposes. This viewpoint 
allows one to think of the group as 
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continuously emergent—the longer 
its members work together, the 
greater the possibility of developing 
a more cohesive and more coopera- 
tive team. Group cohesiveness, 
moreover, may be one of the re- 
sultants of interaction among a 
team’s members that leads to the 
development of a group or team 
“tradition.” The social psychologist 
tends to think of the “group” as 
having a “tradition,”’ i.e., a coop- 
erative association of individuals 
whose members have progressed 
through the of coming to- 
gether in physical proximity, of or- 
ganizing for common goals, and of 
accepting commitment for the 
group’s purposes. The members of a 
traditioned group will have assayed 
each other as resources and as per- 
sonalities, will have established chan- 
nels of communication, and will have 
achieved mutual reinforcement for 
The traditioned 
group, therefore, is a functioning 
unity—functioning for a real and 
genuine goal. While the world’s work 
is accomplished by many traditioned 
groups or teams or staffs, such tradi- 
tioned groups have not been studied 
extensively, primarily because of the 
difficulty of access and their unwill- 
ingness to have others observe their 
processes. 

Methodologically, it is important 
for social psychology to develop an 
understanding of the changing dy- 
namics of the emerging groups. Todo 
this, social psychologists have usu- 
ally worked with ad hoc groups, i.e., 
some experimenter has assembled 
several individuals to work together 
mutually and cooperatively on some 
specific and externally assigned task. 
An ad hoc group, therefore, may rep- 
resent one end of a continuum of 
“group” which extends from the 
just-assembled ad hoc, to the well- 


states 


the common goal. 


established, traditioned group. Ad 
hoc groups, necessarily, will vary in 
the extent of cohesion that they 
achieve, as well as in the acceptance 
of the mutuality of purposes. Each 
externally designated ad hoc group, 
therefore, in some more or less tenta- 
tive way, must organize, test each 
other’s resources, accept the task 
goal, muster its resources to reach 
that goal, and then accomplish its 
end. Such experimental ad hoc 
groups usually cease to exist when 
the experimenter’s purposes have 
been achieved. The research use of 
the ad hec groups is exemplified in 
the experiments of Watson (70) and 
of Shaw (55). They each selected 
college students at random from the 
same class to form a group for the ex- 
perimenter’s purposes, and then, 
only for the duration of the experi- 
ment. 

A common and dangerous practice 
is to generalize the principles valid for 
ad hoc groups to traditioned groups. 
The ad hoc group is treated as a 
microscopic model of the traditioned 
group. This might be true, but has 
not been experimentally validated. 
It is equally possible that ad hoc and 
traditioned groups behave in ac- 
cordance with their individual prin- 
ciples. 

The continuum, therefore, of ad 
hoc to traditioned groups constitutes 
an ambiguous and complex semantic 
range for interacting, face-to-face 
groups who deliberate to solve prob- 
lems or produce joint products. In 
sharp contrast to the continuum of 
ad hoc to traditioned groups which do 
interact among its members, are the 
so-called groups whose individuals 
do not overtly interact with one an- 
other. Rather, these are groups only 
because their constituent units are 
in physical proximity. In social psy- 
chology, groups by physical proxim- 
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ity have been utilized primarily in 
research involving an _ individual's 
performance in a sociophysical set- 
ting of other individuals. Usually, 
the experiments have been studies of 
“social facilitation,’ designed to ap- 
praise the psychological consequences 
on the individual working in a mass 
or among one’s fellows. The socio- 
physical setting has been termed by 
some psychologists as the “clima- 
tized group” but it must be recog- 
nized that the group is a group only 
in the sense of having members in 
physical proximity. 

Three variations in “climatized 
group” are reported in the research 
literature. Of these, the type nearest 
to the real group provides for group 
discussion of a problem followed by 
individual judgments or estimations. 
Such a “‘climatized group” has inter- 
action among individuals but no 
measure of group consensus. The 
jury experiments of Bechterev and 
Lange (7) and of Burtt (9) illustrate 
the pattern, e.g., the credibility of 
the testimony of witnesses is dis- 
cussed by the “jury,”’ followed by 
judgment by each individual on the 
issue. 

The second variety of “‘climatized 
group” does not provide for discus- 
sion, but rather is a sequel to indi- 
vidual evaluation of group judgment 
or is an evaluation made by some open 
form of voting, like a show of hands. 
Gurnee’s (23) 1937 experiment em- 
ploys such a “‘climatized group”: 
college students first took a true- 
false examination as individuals, and 
then repeated the same examination 
as a group. Group choice was deter- 
mined by a show of hands. While 
“visual” interaction by observations 
of other members’ voting behavior 
was evident, it certainly was not the 
overt verbal interaction of delibera- 
tion. 


‘ 


‘ 
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The third variety of ‘“climatized”’ 
group has neither interaction nor 
consensus. In such a ‘“‘climatized 
group,” the individual works alone 
at his task in the presence of other 
people, as, for example, in the social 
facilitation experiments of Allport 
(1) and of Dashiell (15), in which in- 
dividuals either worked in isolation 
or in the presence of others, but with- 
out any interaction. 

The research literature frequently 
refers to a type of group which is 
really not a group at all—rather, it 
is a consequent of statistical compu- 
tations, i.e., averaging of the prod- 
ucts of independent and noninteract- 
ing individuals. The “‘statisticized”’ 
group, for instance, was used in the 
1924 study of Kate Gordon (21) in 
which college students as individuals 
judged weights. These individual 
judgments, then, were averaged to 
form “groups” of 5, or of 10, or of 20, 
or of 50. Since such a statistical 
“group” neither meets nor interacts, 
it does not function as a psychologi- 
cal entity. It is of dubious semantic 
advantage to designate the conse- 
quence of such statistical averaging 
or aggregating a “group” product. 
Experiments with the ‘“‘statisticiz- 
ing” technique may be more appro- 
priately considered as evidence about 
the reliability of measurement (ol 
one judge versus several judges) 
rather than about group dynamics. 
Basically, the “‘statisticized’’ group 
appraises aggregation, not interac- 
tion. 

Another 


so-called the 


group is 
“concocted” group. It, too, neither 


meets nor interacts. In the ‘‘con- 
cocted”’ group the unique elements of 
each individual's products are com- 
bined to form a so-called group prod- 
uct. One form of “concocted” group 
is that in which each individual's 
products are summed to form the so- 
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called group product. A second form 
is represented in Marquart’s experi- 
ment of 1955 (42). Individual’s prod- 
ucts are combined so that one “‘solu- 
tion”’ is assigned a fictitious group if 
at least one of the members working 
individually arrives at the correct 
solution. A ‘“‘no solution”’ is assigned 
the group the “group” 
members arrive at the correct solu- 
tion individually. 

In the experimental literature, the 
earliest studies were of the “‘statisti- 
cized” group, followed subsequently 
and in succession by the “clima- 
tized,”’ the ‘“‘concocted,”’ the ad hoc, 
and most recently the “‘traditioned” 
group. Since the “traditioned”’ 
group is most like real life groups, 
the development may be considered 
to have moved along a continuum 
from artificial (‘‘statisticized’’) to 
real (‘‘traditioned”’). 

The varieties of groups may be 
broadly classified, then, as follows: 


if none of 


1. Interacting, face-to-face group, i.e., in- 
volving group meeting and discussion: 
a. with a tradition of working together 
(traditioned) 
b. with no tradition of working together 
(ad hoc) 
2. Noninteracting face-to-face group, i.e., 
involving physical meeting, but no discussion: 
a. with a sequel appraisal of group opinion 
(climatized) 

. with a sequel appraisal of individual 
opinion (social climatized) 
Noninteracting non-face-to-face group, 
involving no meeting and no discussion: 

a. averaging of individual's performances 
(statisticized) 

b. combining of individual’s performances 
(concocted) 


This broad classification, of course, 
fails to consider every variant of 
“group.”’ For instance, the above 
classification does not give appropri- 
ate consideration to the interacting 
non-face-to-face group which has 
been used to evaluate the effect of 
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interaction net- 
different kinds of in- 
formation, on individuals trying to 
achieve some common end. 
the network research has had as its 
dependent variable the quality of 
cohesiveness or the speed of problem 
solving; some of the information con- 
trol studies have been concerned with 
group and individual satisfaction or 
success in completing a task. None 


different kinds of 
works, or of 


Some of 


of the research, however, has been 
oriented toward group vs. individual 
comparisons. 


Therefore, these im- 
portant studies are not considered in 
the following discussion. 

This review, also, omits studies 
concerned with group psychotherapy 
or group discussions designed to de- 
velop insights about individual's at- 
titudes. Group psychotherapy, in 
some ways, overlaps the “social 
climatized group” but its members 
do, or are expected to, interact. The 
individuals constituting the group 
are selected by an outside agent be- 
cause of his belief that they as indi- 
viduals will be changed by the nature 
of their interaction in, and with, the 
group. There is no group goal, but 
there is an individual objective for 
each person in it—amelioration of 
maladjustments and the achieve- 
ment of self-understanding. 

Similarly, in some forms of opinion 
research, the experimenter meets 
with an assemblage of individuals to 
elicit a gamut of attitudes and values 
about an issue. The assemblage 
meets for the analyst’s purpose and 
not for the individual's, or the 
group’s, objective, although its indi- 
viduals may achieve some groupness. 
In group discussions for opinion re- 
search, the group provides an atmos- 
phere which tends to facilitate in- 
dividual contributions for the experi- 
menter, but is not oriented toward 
either any participant's goals or a 
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mutual task for all of its units. 
Therefore, the review is limited to 
researches contrasting individual with 
any of the six groups conceptualized, 
traditioned, (6) ad hoc (c) 
climatized, (d) social climatized, (e) 
statisticized, and (f) concocted. Al- 
though some variations may be as- 
signed cavalierly to the nearest 
broad category, nevertheless, such 
categorization may aid in reviewing 
specific studies. As such, the classifi- 
cation‘ may provide a clearer basis 
for interpretations about conclusions 
involving the multimeanings of group. 


1.e.: {a) 


CONSIDERATIONS IN INDIVIDUAL 
AND Group COMPARISONS 

History of Subjects 

Performances of groups, of course, 
are contrasted with those of indi- 
viduals. The individual, too, is a 
multimeaning concept. The indi- 
vidual may be an executive thor- 
oughly accustomed to making policy 
or taking action; or the individual 
may be any young person selected at 
random from a larger population to 
participate in an experiment. In the 
same that the ‘‘traditioned” 
group survives because of its joint 
ability to solve problems, so, too, the 
functioning executive continues be- 
cause of his proficiency in setting 
policy or in making decisions. Com- 
parisons of groups with individuals, 
indeed, should give full consideration 
to the similarity of the experiences of 
the groups and of the individuals. 
Logically and psychologically the 
traditioned group should be com- 
pared with the functioning execu- 
tive; the ad hoc group with a random 
individual selected from the same 
supply. Researches all too frequently 
fail to appreciate the significance of, 
and the need forcontrasting the equiv- 
alence for, groups and individuals in 
the quality level of their background 


sense 
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or responsibility for action. Marston 
(44) demonstrated this in a study re- 
ported by Kelly and Thibaut (30). 
He showed that in the realm of legal 
judgments, a collection of untrained 
individuals may be an inferior judge 
of events when compared to a trained 
individual. 


Individual Product 


What performances should be in- 
volved in comparing the group with 
the individual? Is the group product 
to be compared with the average of 
the individual products? with the 
average individual? with the best in- 
dividual? or with the “‘summated” 
individual? The usual procedure 
contrasts the average individual with 
the average group, although studies 
which contrast the best individual, 
or the “‘summated” individual with 
the best group may lead to different 
conclusions. Concern with the aver- 
age disregards the fact that, in gen- 
eral, one (or more than one) indi- 
vidual exceeds the best group and 
conversely that one (or more) indi- 
vidual does worse than the worst 
group. Actually other mathematiza- 
tions may be required to compare 
individual and group product; per- 
haps measurements based on prob- 
abilistic or other systems. 


Motivation 


In such comparisons, motivation, 
too, is often ignored. The view fre- 
quently cited in the literature is that 
meeting in a group stimulates par- 
ticipation and discussion, as well as 
interest in, and acceptance of, the ex- 
perimental task. For instance, if, in 
a group of five members, two or more 
reject the experimental situation or 
task, the group may still emerge with 
some final product. By contrast, 
however, when an individual is not 
motivated to accept a situation or a 
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task, there is no product, or cer- 
tainly not a representative one from 
such an individual. Some product of 
a group, therefore, may exceed that 
of an individual, primarily because of 
differential task acceptance. 

In addition to differential motiva- 
tion, there is also the possibility of 
differential acceptance of the respon- 
sibility in the experimental situation. 
The degree to which a feeling of re- 
sponsibility for the decision affects 
the content and quality of group, 
and of individual, decisions is not 
known, but should be recognized in 
evaluating results. Obviously, the 
effect of responsibility in experi- 
mental situations cannot approxi- 
mate the real situation. 

Tasks 

Not only do “group” and “‘indi- 
vidual” vary in setting and in moti- 
vation, but also a partial confound- 
ing, in the statistical sense, exists be- 
tween task and kind of group. For 
instance, studies using “‘statisticized”’ 
groups tend to use tasks requiring 
estimating or judging. ‘‘Estimating” 
refers to estimating the number of 
items, the length of lines, the weight 
of substances, things perceptible to 
the senses, e.g., Bruce’s (8) Ss esti- 
mated numerosity of buckshot on a 
card, or Schonbar’s (53) Ss estimated 
line length. 

The “climatized”’ group tends to 
be primarily used in “learning”’ ex- 
periments on improvement in knowl- 
edge of subject matter or the mastery 
of a skill, e.g., Gurnee (24) measured 
improvement in the mastery of a 
maze. 

The ad hoc group most commonly 
is used in studies in “problem-solv- 
ing.”’ ‘‘Problem-solving”’ is to mean 
the thinking out of the correct answer 
to a problem. Shaw’s comparison 
(55) of problem-solving by individ- 
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uals and by ad hoc groups is illustra- 
tive. Her problems were mathemati- 
cal puzzles for which there is a known 
(or, at least, a knowable) solution. 
Such problems are less characteristic 
of life situations since few real situa- 
tions occur for which the solution or 
the decision is known or completely 
knowable. The trend in research has 
been away from the puzzle with a 
Eureka solution to more realistic 
problems, where adequacy or good- 
ness (not correctness) is the criterion, 
e.g., Maier’s (41) human relations 
problem. 

In this review, the range of prob- 
lems from the puzzle to the human 
relations situation to the policy deci- 
sion has been distinguished. While 
it is difficult to state wherein the 
processes needed to solve puzzles dif- 
fers from those required to establish 
policy, it is felt that the nature of the 
potential feedback is not the same for 
all kinds of problems. Eureka prob- 
lems can be evaluated as right or 
wrong, but human relations problems 
must be evaluated in terms of rela- 
tive goodness—the range of consider- 
ations in the solution is evaluated, 
e.g., Maier’s (40) parasol assembly 
problem. His problem had no correct 
or unique solution for adjusting the 
slow worker who is the bottleneck in 
an assembly line; rather, the several 
alternative plans for action must be 
appraised for ‘‘elegance”’ in terms of 
likely consequences. 

Criteria 

The truly “‘traditioned group” as 
such, has not been used in researches 
on the quality of group product, al- 
though some approximations have 
been made in studies of quantity of 
“productivity,” e.g., Coch and French 
(11) measuring output of factory 
workers. 

The multiplicity of different ex- 
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perimental tasks leads to a multi- 
plicity of criteria. In ‘‘estimating,”’ 
the criterion is the true order or true 
number; in mathematical problems 
or in puzzle solving, it is the right 
answer; in “learning,” it is improve- 
ment; in “judging,”’ and in complex 
problems, it is consensus of experts 
about the order of merit of the ma- 
terial or the quality of the decisions. 


Side Effects 


Studies of problem solving vary 
not only in nature of the group used, 
the kind of problem or task worked 
on, and the criterion, but they vary 
also in concern with the side effects 
of group participation. These include 
personal gains from the experience, 
commitment to decision, personality 
development or personal growth in 
empathy for the needs and feelings of 
others. This review is primarily con- 
cerned with studies estimating the 
quality of group, and of individual, 
products (although relevant studies 
of side effects will be considered). 
The review excludes group process as 
such, emphasizing only those studies 
contrasting the quality of the product 
from group interaction with the 
quality of the product by the indi- 
vidual. 

Undoubtedly, there are many situ- 
ations for which side effects are the 
major concern, and for which a group 
dynamics program has been insti- 
tuted. Yet, in the military, in educa- 
tion, and in industry, quite fre- 
quently it is the quality of the group 
product that is the major desider- 
atum, and frequently such side effects 
as commitment, morale, or feelings of 
participation are of less importance. 

One common inadequacy of all re- 
viewed studies is that of Ss. Most 
tend to use any Ss available. The 
absence of studies with truly func- 
tioning groups contrasts sharply with 
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the very large number that use col- 
lege students. Among the major find- 
ings of this review of experimental 
work in group products has been the 
recognition of the relatively narrow 
base for the consistently broad gen- 
eralizations about groups in complex 
situations, not only because of the 
narrow range of the kinds of Ss but 
also because of the narrow assort- 
ment of puzzles, games, riddles, and 
judgmental tasks. Insofar as the 
generalizations derive from the “‘sta- 
tisticized,’’ the ‘‘climatized,’’ and the 
“ad hoc’’ group rather than from the 
functioning ‘“‘traditioned”’ group, the 
generalizations in the main are 
founded on the behavior of college 
students with their less certain moti- 
vations and responsibilities rather 
than on the behavior of adults work- 
ing under the genuine tensions and 
pressures of life. Thus, the generali- 
zations will be limited and possibly 
not too realistic. Generalizations, 
psychologically, may be limited by 
kind of group, nature of population, 
the kind of task, and the basis of 
estimating correctness, goodness, or 
adequacy. 


JUDGMENT 


Judgment, in its long use in psy- 
chology, has often been used in re- 
search to contrast group with indi- 


vidual products. One type is ex- 
emplified in the work of Sherif (56), 
where the individual qua individual 
makes judgments in the presence ot 
others to get an estimate of the effect 
of the group setting either upon the 
group’s judgment or upon that of the 
individual. Another type contrasts 
the quality of judgments by the 
group with those by individuals to 
the same stimuli. In many reported 
contrasts of the judging of the 
“group” with the “individual,’’ the 
“group” in the dynamic sense never 
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existed; rather, it was an average of 
several judgments made by nonin- 
teracting and separate individuals, 
i.e., a ‘““statisticized’’ group (3). 


Judgments by Statisticized Groups 


The earliest use of a “‘statisticized”’ 
group was by Hazel Knight in 1921 
(33). In her best-known experiment, 
college students estimated the tem- 
perature of a classroom. The judg- 
ments of the individuals ranged from 
60° to 85°; the “‘statisticized’’ group 
judgment was 72.4°, approximating 
the actual room temperature of 72°. 
The “‘statisticized’” group judgment 
was better than that of 80°) of the 
individual judgments, even though 
20% of the latter are as good as, or 
superior to, the “statisticized”’ group. 
In a distinctly different experiment, 
the Ss, in the absence of any other in- 
formation, ranked 12 children for in- 
telligence from their photographs. 
Each S ranked the children inde- 


pendently; then “‘statisticized”’ group 


rank order was obtained. The 
“eroup” rank order did not correlate 
with actual intelligence test scores 
any better than the individual rank 
orders. Finally, an ad hoc group 
of 10 Ss, met together to discuss 
each photograph in order to obtain 
ranks by an interacting deliberative 
ad hoc group. The ad hoc group 
rank order was significantly more ac- 
curate than that of either the indi- 
vidual or the “‘statisticized’’ group 
ranking. While just one ad hoc 
group was too small for generaliza- 
tion, Knight developed a new ap- 
proach to group versus individual 
judgment. 

In 1923, Gordon (20, 21) began 
publication of her series of studies ex- 
tending Knight’s technique of the 
“statisticized” group. College stu- 
dents ranked weights appraised 
against the criterion of true order. 
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The average of 200 correlations for 
that many individuals was .41. Av- 
eraging any five random individual 
rankings at a time, she obtained 40 
“statisticized” group rankings which 
correlated with true order .68. Sim- 
ilarly “‘statisticized”’ groups of 10, 9, 
and 50 were computed. For four 
“groups” of 50, the average correla- 
tion rose to .94. Gordon, neverthe- 
less, reports that among her 200 in- 
dividual correlations, five were at 
least as high as .94. Her primary con- 
clusion was that “results of the group 
are distinctly superior to the results 
of the average member and are equal 
to those of the best member.” 

The Knight technique of statisti- 
cizing group judgments was used by 
Smith in 1931 (59). He developed 
groups of 5, 10, 20, and 50 under- 
graduates who worked individually 
on the task of judging personality 
and behavior traits of children from 
written reports of their behavior. Al- 
though the correlations against the 
criterion (Smith’s own judgment of 
the correct order) increased as size of 
group became larger, the increase 
Was not as great as in Gordon's study. 
The average correlations based on 50 
individuals was .37, versus .51 for 
the one “‘statisticized” group of 50. 
Six individuals exceeded the ‘“‘stat- 
isticized"’ group correlation. Smith 
attributes the low correlations to the 
great number of, as well as to the 
ambiguity of, the traits, rather than 
as evidence of shortcomings in group 
judgment. 

Judgments of weight as well as the 
numerosity of buckshot were made 
by Bruce’s (8) Ss in 1935. The aver- 
age of the 120 correlations for indi- 
viduals with actual weights was .50; 
the average for two ‘“‘statisticized”’ 
groups of 60 was .88. For the visu- 
ally-presented buckshot, the average 
of the 120 correlations for individuals 
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was .82; for the two “‘statisticized”’ 
groups of 60 was .95. 


Eysenck (16) used “group” tech- 


niques in 1939, when his Ss judged 
the beauty of 12 pictures against the 
criterion of the average judgment of 
700 students. An experimental group 
of 200 was selected from the same 
college 


population. The average 
judgment of the entire 700 was con- 
sidered the “expert.’’ Correlations 
for the 200 individuals against the 
“expert’’ averaged .47; four “‘sta- 
tisticized”’ groups of 50 averaged .98; 
and for one “statisticized”” group of 
200 the correlation became unity. 
Eysenck also reported a table of the 
increment in correlations as a func- 
tion of number of judges utilized in 
‘““statisticized”’ groups. 

In 1945, Klugman (31), using the 
Knight method, had high school 
students judge the number of several 
kinds of items in a bottle: “‘familiar”’ 
(jacks, marbles) and ‘‘unfamiliar”’ 
(lima beans, marrow beans). For the 
unfamiliar items, the one “‘statisti- 
cized group” of 60 was significantly 
closer to the true value than was the 
average of the individuals. On the 
familiar items, by contrast, there was 
no significant difference. Klugman 
concluded ‘“‘when items are unfamil- 
iar group judgment is significantly 
better than most individuals while on 
familiar items only a tendency ap- 
pears.” 

Soldiers estimated the dates of the 
ending of the war with Germany and 
with Japan in another Klugman 
study (32). For the German armi- 
stice, of the 109 individuals who were 
tested, 27 were closer to the actual 
date than the “‘group’”’ mean; and, 
for the Japanese armistice, 59 were 
closer. For the German armistice, he 
found a significant difference be- 
tween the percentages of individuals 
with errors greater than the “group” 
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error as contrasted with individuals 
with errors less than the “group”’ er- 
ror. This difference for the German 
armistice, he “interpreted to mean 
that the group judgment was better.” 
For the Japanese armistice, however, 
there was no such significant differ- 
ence, though the direction favored 
the individuals. In conclusion, Klug- 
man quotes Poffenberger—‘‘one can- 
not say categorically that a group 
opinion will or will not be better than 
the opinions of the individuals that 
comprise the group.” 

Not until 1932 were the obvious de- 
fects of Knight's so-called ‘“‘sta- 
tisticized”’ technique criticized. 
Stroop (62), after verifying Gordon's 
results by repeating her experiment, 
then adapted it by requiring just one 
individual to make 50 separate judg- 
ments, i.e., his four ‘“‘statisticized 
groups’ of 50 were four individuals 
who had each made 50 judgments of 
the same stimulus. When he com- 
bined 5, 10, 20, and 50 judgments of 
the same individual, he obtained cor- 
relations with the criterion nearly 
identical with those that Gordon re- 
ported for combining 5, 10, 20, and 50 
judgments of different individuals. 
Stroop argued that Gordon's re- 
sults, rather than demonstrating the 
social psychology of ‘“grouping”’ 
merely illustrated an obvious statisti- 
cal principle of reducing the error 
variance. 

Farnsworth and Williams (18) in 
1936 demonstrated that Knight's re- 
sults were unrelated to the fact that 
the individual made estimations in a 
group setting. They repeated her 
experiment in every detail except 
that each individual estimation was 
made in isolation. The accuracy of 
their “‘statisticized group’’ results 
were almost identical with Knight's. 
In another experiment, they at- 
tempted to show that improvement 
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by “grouping’’ was not a general 
principle but only applied to judg- 
ments about familiar material. Using 
the size-weight illusion, subjects 
hefted two boxes and then estimated 
the weight of a third constructed to 
be lighter than either of the others 
although larger in bulk. The estima- 
tions were made by individuals from 
whose data the “‘statisticized group” 
estimations were computed. ‘These 
“group” estimations did not ap- 
proach the true value, leading Farns- 
worth and Williams to conclude that 
when the material is unfamiliar, dis- 
torted in a way such that all indi- 
viduals are prone to make similar 
errors of estimation, the “statisti- 
cized” group estimation is not likely 
to be any closer to the true value 
than are individual estimations. 
Klugman’s first study (31) indeed, 
was an experimental investigation of 
the Farnsworth-Williams generaliza- 
tion, but, contrary to Farnsworth- 
Williams, he found that when Ss are 
unfamiliar (as he defines “‘unfamilli- 
arity’’) with the object to be esti- 
mated, the “group” estimation is 
significantly different from that of 
the individual. 

Despite Gordon’s defense in 1935 
(22), the contention that ‘mere 
grouping ranks does not produce cor- 
relations,”’ critique of her methodol- 
ogy was continued by Dashiell in 
1935 (16), Preston in 1938 (49), and 
Smith (58) in 1941. The recent criti- 
cism has emphasized that, regardless 
of the statistical argument, experi- 
ments in which groups never meet 
can add little to understanding group 
process in social psychology. Preston 
(49), for instance, suggests that, not- 
withstanding what the Gordon re- 
sults do show, they give no evidence 
either for psychological process or 
for group interaction. These studies 
have been cited not only because the 
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technique has been used widely, but 
because the Gordon and the 
Knight studies in particular are used 
as evidence for the values of group 
process. 


ulso 


Judgments by Interacting Group Mem- 
bers 


What are the results from judg- 
ments by groups with genuine inter- 
action among its members? The ear- 
liest study is that of H. E. Burtt (9) 
in 1920 with testimony. Individual 
Ss heard testimony by “stooges,” 
some of whom were lying and some 
telling the truth. Each S judged 
which were truthful and 
which were not. The individual votes 
were tallied and announced immedi- 
ately. Ss then were constituted as a 
total interacting group of “jurors,” 
who, after discussing the testimony, 
voted again as individuals. On the 
first vote, 48°, were correct; after 
discussion, the percentages were not 
different. Of the 25 shifts in vote, 14 
were in the right, and 11 in the 
wrong, direction. Burtt concluded 
that while discussion alters judg- 
ments, it not 
prove them. 

Dashiell (15) reports a study by 
Bechterev and Lange in 1924 in 
which individual judgments were 
made for a variety of tasks, ranging 
from the time interval between two 
sounds to the justification for a man 
beating a boy who had stolen from 
him. After the individual judgments 
had been made and summarized, the 
results were presented for discussion ; 
after that, a second individual judg- 
ment was made. Their results seem 
to be consistently in favor of postdis- 
cussion judgments. Bechterev and 
Lange maintain that the group proc- 
ess is beneficial for all individuals, al- 
though those who have less to offer 
the group gain most by it. Dashiell 


“stooges” 


does 


necessarily im- 
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states that Bechterev and Lange's re- 
port is not clear about the extent of 
actual discussion or even if the read- 
ing of summaries itself was the dis- 
cussion. 

In 1932, Arthur Jenness (29) in- 
vestigated the effect of discussion in 
ad hoc groups on the accuracy of 
individual judgments. Ss as indi- 
viduals estimated the number of 
beans in a bottle; then they dis- 
cussed the estimates in ad hoc groups 
ol three, and made a group estimate; 
and, finally, made a second postgroup 
individual estimate. In two different 
experiments, Jenness formed the ad 
hoc groups in different ways: in one, 
the individuals were chosen to make 
for maximum disagreement in the 
groups; in the other, the individuals 
were chosen to assure maximum 
agreement. With ad hoc groups se- 
lected for maximum disagreement, 
group estimates were less accurate 
than the average of individual esti- 
mates had been, but their individual 
postdiscussion judgments were better 
in 20 of 26 instances, with a 60% 
average reduction in error. 

In a control experiment, in a class 
of individuals who as_ individuals 
made two estimates without any in- 
tervening discussion, there was an 
average reduction in error of 4%. 
When the ad hoc groups were selected 
for maximum agreement, however, 
the group estimates were more accu- 
rate than the first individual estimates 
had been, but the postdiscussion in- 
dividual judgments were not signifi- 
cantly different from the control. In 
a fourth aspect of the experiment, 
after the initial individual judgments 
had been made, the results were read 
to the class who were then allowed to 
form groups as they wished. The re- 
sults parallel closely those for groups 
selected for maximum disagreement. 
Jenness (29) concludes that discus- 
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sion does not make group estimates 
more accurate, but stresses the im- 
portance of the knowledge of differ- 
ence among judges in improving 
group judgments. He also introduced 
a method by which to estimate gain 
from group participation, i.e., the 
gain made by the individual subse- 
quent to the group result, an ap- 
praisal too frequently ignored in the 
experimental literature despite the 
fact that it is usually suggested as an 
advantage of group process for educa- 
tion and for industry. 


Judgments by Noninteracting Group 
Members 


In 1937 Herbert Gurnee (23) at- 
tempted to evaluate only the effects 
of discussion by contrasting the judg- 
ments of individuals with those of 
noninteracting face-to-face groups 
with a sequent measure of group 
opinion. Individuals made _ their 
judgments on a written true-false ex- 
amination. The same statements 
then were put to them in groups of 
53, 57, 66, and 18 where each judg- 
ment was made by acclamation, with 
a show of hands when necessary. In 
each experiment, the group was bet- 
ter than the average individual, and 
approximately equal to the best indi- 
vidual. Gurnee computed “‘statisti- 
cized"’ groups but found that four of 
his five face-to-face groups were 
superior to their statisticized com- 
puted results. He reports a social in- 
fluence upon the doubtful, in that 
those who were more certain of their 
judgments often carried the doubtful 
with them. His general conclusion 
was that although the group will be 
superior, the amount is unpredictable 
since the amount of gain depends on 
how well the individuals in it will do, 
since a task difficult for its constitu- 
ent members will also be difficult for 
the group. 
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The improvement in accuracy of 
group judgment was demonstrated 
by Rosalea Schonbar (53) who re- 
ported that pairs of Ss were more ac- 
curate in estimating line length than 
were individual Ss since the pairs 
seem effective in cancelling of over- 
and under-estimates in interaction. 

In his review, Dashiell (15) reports 
an experiment he conducted. He 
compared the written reports of two 
different witnesses to a staged class- 
room incident with those written by 
legal psychology students after hear- 
ing an oral version of the incident. 
The legal psychology students first 
reported as individuals their concep- 
tion of the original event, and then 
made a subsequent report as a group. 
None of the seven students as indi- 
viduals gave an account as complete 
as either of the two witnesses; most 
of the individual reports were in- 
termediate between those of the two 
witnesses in accuracy. The group re- 


port was less complete but more 
accurate than either witness and all 


but one of the seven individuals. 


Generalizations 


What generalizations can be made 
about group and individual judg- 
ments? Generalization is more diffi- 
cult than the earlier work based upon 
the “‘statisticized’’ groups had im- 
plicd. Increase in accuracy of judg- 
ment is not obtained by the simple 
expedient of convening people into a 
group. For the results of Farnsworth 
and Williams (18) and Klugman (32) 
have shown that for some type of ma- 
terial a group judgment does not dif- 
fer significantly from that of the av- 
erage individual; and Jenness (29) as 
well as Gurnee (25) indicated that 
group superiority depends upon the 
quality of the judgments, and the 
range of judgments of individual 
members of the group. At best, 
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group judgment equals the best indi- 
vidual judgment but usually is some- 
what inferior to the best individual. 
Bechterev and Lange (7) have shown 
that the individualse making the 
poorer estimates benefit more from 
interaction in a group than 
making the better estimates. Verbal 
interaction, however, does not seem 
essential to improvement, for Gurnee 
(25) as well as Bechterev and Lange 
(7) obtained significant improvement 
without discussion. Regardless of 
the shortcomings of the ‘“‘statisti- 
cized”” group technique, the experi- 
ments with face-to-face groups also 
showed improved group judgment. 
This predicted superiority of groups 
is more probable when the material 
is unfamiliar or when there is an ex- 
tensive range of opinion in the group. 


those 


LEARNING 


In using learning as a basis for 
contrasting groups and individuals, 
the researches usually are less rigor- 
ous than those based upon judging 
or estimating. The lack of rigor 
comes from the ambiguity of terms, 
e.g., ‘the lecture system,” 
cussion,’ and “study group,” as well 
as the semantic confusion in “‘group”’ 
and “‘individual.’’ Furthermore, the 
concepts of “change,” ‘“‘improve- 
ment,”’ and “‘growth”’ have no refer- 
ence either for ‘‘greater’’ or “more” 
or in statistical significance. Many 
reports are more testimonials by 
classroom teachers for methods they 
have used than experiments. As is 
usual in evaluating methods of in- 
struction, variability in the quality 
of instructor or of the instruction is 
ignored. 

G. Ryan (52) made one of the first 
studies using learning as a basis for 
evaluating group and_ individual 
achievement. She divided each of 
four college levels into equivalent 


“class dis- 
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halves for intelligence. One of each 
of the equivalent halves studied 
kE-nglish and education as individuals 
for a six-week period with the in- 
structor available for consultation; 
the other studied as a “regular” 
class. At the end of the first six 
weeks, the halves reversed roles; and 
for a final six weeks both halves 
studied together as a ‘“‘regular”’ class. 
At the end of each six weeks’ period, 
comprehensive achievement tests 
were given with the general result 
that those who had studied as a ‘‘reg- 
ular’ class did better. Despite such 
results, however, Ryan _ concludes 
that when time spent on study is 
equated, independent study was su- 
perior for freshmen, sophomores, 
juniors, and seniors and for all ability 
levels. This interpretation is based 
on the assumption that independent 
study took less of the instructor's 
time than did class instruction. Ryan 


seems to gloss over significant aspects 
of her results by implying that one 
goal of education is saving the teach- 
er’s time. 

In 1925 Bane (5) reported results 


comparing the “lecture method 
and “class discussion” technique. Ss 
were college students in education 
and psychology. In each of five ‘‘ex- 
periments’ those taught by ‘“‘class 
discussion” did significantly better 
on tests of delayed recall. On im- 
mediate recall, however, three did 
worse. 

Using two equivalent sections of 
students, in 1926, Barton (6) gave 
each the same preliminary instruc- 
tion on first-year algebra problem 
solving. One section was assigned 
new problems to solve as individuals; 
the other solved the new problems 
using class discussion. Two posttests 
of problem solving in algebra favored 
class discussion. 

In 1928 Spence (61) compared the 
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efficiency of learning by the “lecture” 
system with the ‘“‘class discussion” 
system. Two large classes of approxi- 
mately 150 Ss each were compared. 
The first section took an initial test, 
studied under the “‘lecture’’ system 
for one semester, took a second test, 
then studied under the “class dis- 
cussion”’ system during the second 
semester, and took a final examina- 
tion. The second section followed the 
same schedule, except it reversed the 
order of “‘lecture”’ 
sion’’ study. The test results indicate 
superiority for students in “lecture”’ 
During the first semester 
those who had the usual “lectures” 
forged ahead. During the second 
semester, those who | previously 
studied under the ‘‘class discussion” 
method made up the ground. 
The large size of these classes should 
be borne in mind. These results may 
be valid for extremely large classes, 
but varying results may be obtained 
as class size varied. 

The three following experimenters 
agree on the beneficial side effects of 
“class discussion”’ in comparison to 
the traditional “lecture’’ method, 
but disagree as to which is the supe- 
rior learning system. 

Thie (64), using high school Eng- 
lish students, contrasted two equiva- 
lent halves on the basis of ability, 
one half working as a usual class with 
instruction by lecture, and the other 
studying in groups of five members 
each. Improvement was measured 
by difference between pre- and post- 
term scores on a reading test and on 
the writing of an original paragraph. 
Not only did the half that had studied 
in groups show greater improvement 
on both tests, but, in addition, these 
students showed greater gain in self- 
sufficiency as appraised by amount of 
voluntary work, by individual ac- 
tivity, and by reported enjoyment of 


and ‘‘class discus- 


classes. 


lost 





350 


the course. Of the 24 students in 
groups, 16 registered for another 
term of English, in contrast with but 
one of the 24 of the class. Though 
the so-called measures of self-suffi- 
ciency are not adequately defined, 
Thie was one of the first to suggest 
that the benefits of small group tech- 
niques in the classroom may be un- 
derestimated when evaluation neg- 
lects side effects and emphasizes 
content achievement only. 

In 1927 Zeleny (73) made much 
the same point in the first of his 
studies on the _ discussion-group 
method of teaching with college 
students in sociology. The experi- 
mental classes were formed into 
groups of seven who were given writ- 
ten assignments and a syllabus. The 
instructor gave help to the seven- 
member groups as needed. The con- 
trol classes were taught by ‘“‘tradi- 
tional lecture.’’ On terminal tests of 


factual knowledge and of opinion 
there were no significant differences 
between the group-discussion method 


and traditional lecture method. 
Zeleny suggests that the expected 
values of group discussion were not 
in content mastery but rather in more 
teacher-student cooperation, in- 
creased mutual tolerance of each 
other's views, and better working to- 
gether with others without sacrifice 
in subject mastery. 

In a second study of group learn- 
ing, Zeleny (74) in 1940 matched two 
classes for age, sex, intelligence, and 
subject-matter proficiency. These 
were taught by the same instructor: 
one by lecture-recitation, the other 
in discussion groups of five students 
each. On gains in content knowledge, 
there were no differences between the 
two. Groups, however, were superior 
to those who had lecture recitation, 
in participation, in personality de- 
velopment, in social adjustment, and 
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in cooperation. Essentially, the re- 
sults corroborate Zeleny’s earlier sug- 
gestion that the advantage of group 
techniques for school learning is more 
in personality changes than in mas- 
tery of academic skills and knowl- 
edge. 

In 1951 Asch (2) conducted an ex- 
periment to compare the over-all ef- 
fectiveness of nondirective teaching 


(group participation) using the coun- 


seling methods of Rogers, Combs, 
Snyder, etc., to the usual lecture 
method. Four undergraduate sec- 
tions of general psychology served 
as Ss. The experimental section was 
informed that no tests or final ex- 
amination would be required. The 
control sections worked toward a 
final examination. The groups were 
compared for knowledge of subject 
matter, social attitudes, emotional 
adjustment, and the over-all evalua- 
tion of the course. The results indi- 
cated that the control group was 
superior to the experimental group 
in knowledge of subject matter. 
However, both groups were not sim- 
ilarly prepared to take the final ex- 
amination. On the personal evalua- 
tions of the course by the students, a 
number suggested that nondirective 
teaching encourages greater amounts 
of outside reading, ‘stimulates think- 
ing about basic conceptual material, 
and makes for more independent de- 
cisions based on the knowledge of 
many individuals and not just one 
“authority.”” No differences were 
found between the directive and non- 
directive groups concerning their 
social attitudes as measured by the 
Social Distance Score. A comparison 
of MMPI scores indicated that the 
nondirective group improved to a sig- 
nificantly greater degree than the 
control group in emotional adjust- 
ment. Finally, an analysis of the 
Course Evaluation Forms completed 
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by each S indicated that the Ss felt 
that the experimental section was 
more helpful in teaching the subject 
matter than did the members of the 
control group. 

In 1937 Gurnee (24) reported the 
first of two “learning’’ experimental 
studies which, though more rigorous 
in control of basic variables, used 
somewhat realistic problems. 
Halt of his Ss worked as individuals, 
while the other half worked in groups 
averaging 10 members. The task was 
to learn a maze. Individuals were to 
concentrate on eliminating errors 
without concern for time; groups 
voted each step in the maze by ac- 
clamation. Groups and individuals 
had six trials. Groups did signifi- 
cantly better than individuals by 
having fewer errors and completing 
the first perfect trial sooner. Gurnee, 
then, tested all Ss as individuals on a 
seventh trial. He did not find any 
significant difference as a result of 


less 


the two different kinds of experience. 
His results may be subsumed under 


generalization that when 
groups are in agreement, group mem- 
bers will not improve as a function 
of group experience. 

The next vear Gurnee (25) re- 
ported a similar study with quite dif- 
ferent results. In addition to the 
same maze, he added the learning of 
the arbitrarily correct number of 20 
pairs of two-place numbers each in 
the course of six oral trials. Indi- 
viduals were contrasted with groups, 
then a seventh written trial was 
given to all as individuals. On the 
seventh trial, those who had worked 
in groups made significantly fewer 
errors than those who had worked the 
first six trials as individuals. 

Moore and Anderson (45) con- 
trasted the learning of ad hoc three- 
man groups with that of individuals 
in applying some of the laws of the 


Jenness’ 
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calculus of propositions in order to 
solve 10 different symbolic logic prob- 
lems. The results are based on six indi- 
viduals and six matched groups. In 
general, none of the differences is 
significant; i.e., the number of steps 
taken, of errors made and of time to 
solution did not differ statistically be- 
tween individuals and groups. There 
was a greater tendency, however, for 
individuals to repeat steps, suggesting 
that the members of the group re- 
member steps taken. It is not sur- 
prising that estimates of variance for 
groups and for individuals usually are 
not significantly different, but the 
direction is always for greater vari- 
ance among individuals. Nevetthe- 
less, in the use of symbolic logic 
Moore and Anderson have introduced 
a novel learning task for use by psy- 
chologists. 

Contrasting the results by groups 
and by individuals in “‘learning”’ sug- 
gests quite amorphous generaliza- 
tions. Spence indicated that the lec- 
ture system is superior to the class 
discussion system for large classes. 
Ryan's results agreed. Asch’s results, 
in a narrow sense, must be similarly 
interpreted. Thie, on the other hand, 
found that under his experimental 
conditions pro- 
duced significantly better learning 
than the “lecture’’ method. Ryan 
found class discussion superior for 
certain types of learning, and inferior 
for others. Zeleny and Gurnee found 
no significant differences between the 
two forms of learning. These amor- 
phous results suggest several explana- 
tions, the most likely of which is that 
these experiments were conducted 
under such varying conditions that 
seemingly diametrically opposed re- 
sults are understandable. For exam- 
ple, the size of groups can be expected 
to have a profound effect on results. 
It is known that as group size in- 


“class discussion” 
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creases, individual involvement de- 
creases, and inhibition increases. 
Large discussion groups, therefore, 
might be expected to produce less 
learning than smaller groups. Other 
factors such as announced goals, sub- 
ject matter, methods of measuring 
improvement, etc., can be expected 
to have profound effects on the re- 
sults. This indicates a serious need 
for additional experimentation which 
can control important conditions. 


SociAL FACILITATION 

Social facilitation refers to the ef- 
fects on an individual of working ata 
task in the presence of other indi- 
viduals but independently of them, 
i.e., not interacting with them, al- 
though they may be face-to-face in 
an audience or classroom. In social 


facilitation experiments, there is no 
interaction or cooperation, and, of- 
ten, no expressed feeling of rivalry, 
although competition may affect the 


results. Allport (1) used graduate 
psychology students, who, in a first 
period, took a free association test 
alone, then, in a second period, took 
it individually in “‘groups”’ of three to 
five persons. Fourteen out of 15 in- 
dividuals produced more words while 
working in the social setting than 
when working alone, although the dif- 
ferences were not significant sta- 
tistically. Allport found that the ef- 
fect of the so-called co-working 
“group” on individual productivity 
was to increase quantity but decrease 
quality of the associated words. He 
concluded that some tasks may be 
better done alone than in groups. 
His basic technique was used 
through the years by Sims (57), 
Sengupta and Sinha (54), and others 
with almost no variation in results. 
The assigned tasks were always 
repetitive and meaningless, e.g., let- 
ter cancellation in running text, etc. 
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In Sengupta and Sinha’s study, for 
example, in Ss that worked at a task 
fom nine days, output did not vary 
much after the third day. Upon 
changing the work situation into a 
social setting, output rose signifi- 
cantly until restabilized at a second 
but higher level. Mukerji (46) found 
that with children doing letter can- 
cellation and letter-naming, almost 
90° of the individuals had superior 
outputs in the social setting, but that 
oscillation in production was greater 
when performing in groups. 

In 1952 Wapner and Alper (69) de- 
scribed a restraining force as a result 
of an audience. One hundred twenty 
Ss were tested in three varying situa- 
tions. All were asked to select one of 
two words which best fit a given 
phrase. In the first situation only 
the S and the experimenter were 
present. In the second situation, the 
S and the experimenter were present, 
but the S was informed that an “‘un- 
seen”’ audience was listening to and 
watching his performance. In the 
third situation, the S and the experi- 
menter were present with a 
audience. Either task-oriented or 
ego-oriented instructions were given 
the Ss. In the task-oriented instruc- 
tions the Ss were informed that the 
material rather than the S was being 
studied. In the ego-oriented instruc- 
tion the Ss were informed that the 
task was a form of personality test 
and that they, rather than the task, 
were being evaluated. The results 
indicate that the time to make a 
choice was longest in the presence of 
an ‘unseen’? audience under both 
forms of instructions; next longest 
in the presence of a ‘‘seen”’ audience; 
and shortest when there was no audi- 
ence. The significant differential ef- 
fects of the audience variable oc- 
curred for the first half of the experi- 
mental sessions only. Items with 


seen 
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personality references vielded longer 
times than neutral items. Contrary 
the experi- 
menters there were indications that 
time to make 


to the expectations of 


a choice is longer for 
task-oriented than for ego-oriented 
instructions. 

An extensive inquiry into the ef- 
fect of co-workers on productivity is 
reported by Dashiell (14). He in- 
vestigated the conditions of working: 
(a) alone; (6) together but noncom- 
petitively; (c) together and competi- 
tively; and (d) alone, but under ob- 
servation. Speed increased for each 
of the three tasks (multiplication, 
serial association, and a mixed rela- 
tions test, particularly between con- 
ditions (a) and (d). Accuracy, on 
the other hand, much more 
evenly distributed the four 
conditions: with the only clear dif- 
the condi- 
tion for which the work was least ac- 
curate although the greatest amount 
was produced. 

Kelly and Thibaut (30) reported a 
study by Wyatt, Frost, and Stock 
(72) in 1934 which indicated that in 
real life situations involving work of a 
highly repetitive nature, social facili- 
tation effects have been found con- 
sisting of closely similar production 
curves for employees working to- 
gether. The authors found that work- 
ers’ rates of output varied with the 
output of others in the work group. 
This relationship was _ particularly 
close for pairs of workers seated op- 
posite each other, and was somewhat 
more marked the more visible and 
the more measurable the output. 
When individual workers were sub- 


was 
among 


ference in “observation” 


sequently isolated, the correspond- 
ence between their work and that of 
the others disappeared. 
Hilgard, Sait, and 
indicated tha 
production be affected by social facil- 


Magaret (27) 
not only can actual 
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itation, but also level of aspiration. 
The Ss worked in groups of three to 
six members. They individually 
worked on successive subtraction of 
three place numbers. The material 
was graded in difficulty in order to 
produce experimental differences in 
success. After the first experimental 
when all Ss’ scores were 
known to each other, they were asked 
to estimate their future performance. 
Those Ss ranking superior in relation 
to their social group tended to esti- 
mate their future performance too 
low, while those Ss making inferior 
scores tended to estimate their future 
performances too high. Though the 
critical ratios were low and caution 
was recommended in interpretations, 
the trends within the groups were 
clear. The authors speculated that 
the desire for social conformity might 
well produce this regression of pre- 
dicted scores toward the mean. 

To a degree, the influence of co- 
working members, indeed, may be 
stronger in a group interacting for a 
common objective. The feeling of 
ego-involvement in the group's prod- 
uct may be a significant factor as in 
problem solving for a group result. 


session, 


PROBLEM SOLVING 


In problem solving, few experi- 
mental studies contrast the quality 
of solutions by groups and by indi- 
viduals. Most results seem to be by- 
products of investigations of the 
problem-solving process. Neverthe- 
less, these few studies are those most 
frequently cited as evidence of group 
superiority, e.g., Watson's (70) com- 
parison of groups and of individuals 
in problem solving. He used ad hoc 
groups of college students given the 
task of making as many shorter words 
from the letters of a larger word as 
possible within a time limit. For the 
first trial, Ss worked as individuals; 
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for the second and third trials they 
worked in 20 ad hoc groups of from 
3 to 10 members; and for a final 
fourth trial again as individuals. The 
group product, i.e., the number of 
different words, was significantly 
larger than that made by the best in- 
dividual and thus, obviously, larger 
than that of the average individual. 
When Watson (70) formed what may 
be called the ‘‘concocted group” or 
“summated individuals,” i.e., added 
together all the different words in the 
first trial made by the individuals 
comprising the groups in Trials 2 and 
3, he found the average 
group” product significantly larger 
than the average ad hoc group prod- 
uct. Even though the average prod- 
uct of the ad hoc groups significantly 
exceeded the product of the average 
individual or of the best individual, 
nevertheless it was significantly in- 
ferior to the full resources of all of its 
individual members. Group interac- 
tion may inhibit the fullest potential 
contribution by its members.  In- 
deed, the superiority of the ‘“‘con- 
cocted” group over the interacting 
ad hoc group suggested such an inhibi- 
tion. 

In a subsequent study Watson (71) 
evaluated group and individual su- 
periority on nine different tasks: find- 
ing antonyms, solving a cipher, draw- 
ing from stated facts, 
completing sentences, listing steps in 
problem solving, composing limer- 
icks, comprehension of reading, and 
an intelligence problem. There were 
three equivalent forms of each task; 
Ss first did one form as individuals; 
second, another in ad hoc groups; and 
third, the remaining form as indi- 
viduals. On all nine tasks, the aver- 
age achievement of groups was super- 
ior to that of individuals; the dif- 
ferences, however, ranged from small 
and insignificant for reading com- 


“concocted 


conclusions 
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prehension to large and significant 
for completing sentences. In speed, 
on the average, groups were superior 
to individuals. For the nine tasks, 
on the average, about a third of the 
individuals were superior to their 
group in score and in speed. Such 
superiority, however, was a function 
of the task: for instance, on anto- 
nyms, 11°% of the individuals made 
scores superior to their group in con- 
trast to 50° of the individuals who 
did better than their group on the 
intelligence problem. The order of 
group superiority is as listed above. 

In 1932 Shaw (55) compared 
groups with individuals ir the ra- 
tional solutions of complex problems. 
A class in social psychology was di- 
vided into halves. In the first period, 
half the class worked in five ad ho« 
groups while the others worked as 
individuals; in a subsequent period 
the roles of the two halves were re- 
versed. 


In the first period, the task 
was the solution of three very similar 
classical ‘‘mathematical recreations” 


puzzles, e.g., the three beautiful 
wives and their jealous husbands who 
had to cross a river by rowboat carry- 
ing three persons at most, under the 
constraints that no wife and all hus- 
bands can row and that no husband 
would allow his wite in the presence of 
another man unless he was also pres- 
ent. 

In the second period, the problems 
were quite different: (a) rearranging 
words to form the last sentence of a 
prose passage; (>) rearranging words 
to form the last three and a half lines 
of a sonnet; and (c) to find the most 
economical routes for two 
buses to bring children to a common 
school under the constraint of maxi- 
mum bus capacity and of a specified 
number of pick-up stations. For the 
puzzles, obviously, there is just one 
right answer; for the word rearrange- 


school 
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ments, however, the correct answer is 
arbitrarily the original word order, 
and for the school bus problem, it is 
the one that gives minimum mileage. 
Ss are more likely to be able to verify 
the solution for the three mathemati- 
cal puzzles which were given in the 
first period; but for the second period 
problem, they have no way of verify- 
ing their solutions because the correct 
answer was arbitrary. For instance, 
word rearrangements can be com- 
pletely appropriate in meaning de- 
spite deviation from the original word 
order. Puzzles having unique solu- 
tions may be termed ‘“‘Eureka”’ since 
Ss can, and do, get confirmation for 
correct solution. 

For the first period, on the so-called 
Eureka problems, three of the 21 in- 
dividuals and three of the five groups 
solved the first problem; no indi- 
vidual and three groups solved the 
second problem; and two individuals 


and two groups solved the third prob- 


lem. No individual solved more than 
one problem, but just three groups 
made the eight group solutions. Two 
groups and 16 individuals never 
solved any of the three puzzles. For 
the second period problems, three of 
17 individuals and four of the five 
groups solved the first problem com- 
pletely; a fifth group and seven other 
individuals made just one error. No 
individual and no group solved the 
other two problems. Group superior- 
itv rests only on the eight solutions 
by groups in contrast with the five by 
individuals. In general, interpreters 
of the Shaw experiment have disre- 
garded not only the similarity among 
the three problems but also the fact 
that the solutions were based on the 
sum over-all problems rather than on 
the number.of identical solutions by 
individuals and by groups. For in- 
stance, when only the solutions by in- 
dividuals and by groups for the first 


problem are compared, there is no 
statistically significant difference. 
Shaw neither discussed the fact that 
two of the groups never solve any of 
the three puzzles, nor the relative 
efficiency of three solutions among 21 
individuals versus three solutions for 
five groups of four members each, i.e., 
20 individuals altogether. 

Shaw advanced methodology by 
her more rigorous procedures for 
studying problem-solving of indi- 
viduals and of groups; however, the 
interpretations implicit in her conclu- 
not conform to the 
straints placed either by the kind of 
problems, or the type of Ss, or the 
possibility of transfer of training. 
rhe fact that two groups never solve, 
and that three groups get eight solu- 
tions, suggests two hypotheses for re- 
search: (a) that transfer of training is 
more likely in groups and (0b) that 
group solution is possible only if at 
least one individual as an individual 
could have solved the problem. 

Shaw accounted for group super- 
iority on the basis of observations 
that groups rejected incorrect solu- 
tions and checked against errors. 
Since her results differed with the dif- 
ferent problems, her interpretations 
might have been that for problems 
with just one unique answer, groups 
were superior; but for problems with 
a wide range of answers, there is no 
genuine difference. Lorge and Solo- 
mon (38) re-examined the data for 
the Eureka problems in 1955 and sug- 
gested other explanations for group 
superiority. Their work is reported 
later in the section on mathematical 
models. 

The question of the relative effi- 
ciency of three solutions among 21 
individuals versus three solutions for 
five groups of four members each, i.e., 
20 individuals, was investigated by 
Marquart (42) in 1956. She essen- 


sions do con- 
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tially replicated Shaw's experiments 
However, Mar- 


with similar results. 


quart noted that Shaw's conclusions 


about group superiority hinge on 
comparing possible 
successes obtained by individuals to 
percentages of possible successes ob- 
tained by groups. A fairer compari- 
son, she proposed, involved treating 
individual successes on a group basis, 
e.g., if, when working individually, 
one of the three individuals who later 
make up a group of three get the cor- 
rect answer, individuals are credited 
with one success in one trial, rather 
than one in three. If, on the other 
hand, no correct solution is forthcom- 
ing from any of the three individuals, 
then one failure is attributed instead 
of three. On this basis, the indi- 
viduals turned out to be slightly 
superior in both Marquart’s results 
and in Shaw’s. 


percentages ol 


Shaw, however, did consider 
her conclusions limited by problem 
type. In 1938 Thorndike (65) in- 
vestigated the hypothesis that as the 
range of increased, the 
superiority of the group over indi- 
viduals increased. Thorndike 
two versions of each of four problems; 
one with a “‘limited’’ number of re- 
sponses and the other with an “un- 
limited”” number. For instance, a 
multiple-choice item with four op- 
tions was paralleled by an open-ended 
version; or similarly, completing a 
crossword puzzle was paralleled by 
requiring the construction of a cross- 
word puzzle. The other tasks were 
limerick completion (either one line 
or three to be supplied); and a vo- 
cabulary test of synonyms, five 
choices or recall. Ss were college 
students, who worked four two-hour 
sessions, each a week apart, in two 
sessions as individuals and in two as 
groups. For all four tasks, differences 
vere in the direction of the hypothe- 


not 


responses 


used 
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sis, with three of them significant. 

Thorndike’s tasks, in a s« 
trast recognition 
groups and in individuals. The recog- 
nition item form favors groups. This 
indicates, as Shaw had suggested, 
that group superiority results more 
from members pooling information 
by rejecting incorrect options than by 
contributing options for considera- 
tion. Thorndike’s problems differed 
so much from those of Shaw as to sug- 
gest that generalizations about prob- 
lem-solving and about group superi- 
ority seem to depend upon the nature 
of the tasks. 

Husband (28) in 1940 attempted 
the study of a group in contrast with 
an individual as measured by re- 
quired man-hours to arrive at a solu- 
tion and the quality of the product. 
He used three tasks: deciphering a 
code, solving a jigsaw puzzle, and 
solving arithmetic problems. Ss were 
students in psychology, 40 working 


‘nse, con- 


versus recall in 


Some pairs were 
friends: some strangers. Pairs were 
superior on the first two tasks, but on 
the third (arithmetic problems) there 
was no significant difference. Hus- 
band suggests that on the arithmetic 
task one member of the pair tends to 
take the lead and do all the work. In 
all comparisons, pairs of strangers 
did better than pairs of friends. 
Husband's results emphasized the 
conclusions from some of the earlier 
studies about originality and routine 
performance. His pairs did better on 
problems requiring some originality 
or insight than on the more routine 
arithmetic problem; this confirmed 
Thorndike’s hypothesis that the su- 
periority of the group product over 
the individual product is greater in 
problems with unlimited solutions 
than in those with limited alterna- 
tives and confirmed Watson's and 
Shaw’s findings that the group han- 


alone, 80 in pairs. 





A SURVE} 


dled complex problems adequately. 
Regarding efficiency, however, he in- 
dicates that the time saved in pairs 
Was never more than a third—not the 
half needed to equate time for pairs 
and for individuals, although Hus- 
band failed to the better 
quality for the time used by pair 
After a interval following 
Thorndike’s work, Taylor and Faust 
(63) compared individuals and ad 
hoc groups in solving the identity of a 
topic in the game of Twenty Ques- 
Elementary Psychology stu- 
dents worked for four days at the rate 
of four problems a day, either as in- 
dividuals, or in pairs, or in groups of 
four. On the fifth day, all Ss worked 
alone. Although time was recorded, 
the prime criterion was the number 
of questions necessary to reach a 
solution. In pairs and in groups, dis- 
cussion was allowed, with the motiva- 
tion that they were competing 
against other groups but not against 


each There were significant 


consider 


long 


tions. 


other. 
differences between the scores of in- 
dividuals and those of pairs and of 
groups in questions, time, and fail- 
ures. Except for failures, there were 
no significant differences between the 
Of course, in ef- 
number man- 
hours to reach a solution, the group 
is inferior to the individual, with four- 
man groups less efficient than pairs. 
The gain from training acquired the 
first four days by each individual as 
measured on the fifth day did not 
seem to be different whether the first 
four days’ training came by practic- 
ing as individuals, or in pairs, or in 
groups. 

Taylor and Faust’s Twenty Ques- 
tions approximated the Eureka, but 
also it was summative in that each 
member's contributions could add to 
the group result. Their data tend to 
corroborate Shaw and Watson; but 


pairs and groups. 


ficiency, i.e., the of 
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they contradicted studies ol 
“learning” there was no 
transfer to individual achievement as 
a consequent of previous differential 
group or individual experience. 
Research contrasting group and in- 
dividual performance in ‘‘learning”’ 
suffered from a lack of experimental 
controls; research with problem solv- 
ing suffered from a lack of reality, 
etc.; problems or tasks are far re- 
moved from the genuine and the real. 
The problems, in general, have been 
puzzles, riddles, or information-test 
questions. Results from such tasks 
were not sufficiently conclusive to al- 
low 


some 


insolar as 


an unambiguous generalization 
about the superiority of groups over 
individuals with more realistic prob- 
lems. 


MEMORY 


Little work in the area of group 
and individual memory has_ been 
completed. In 1952 Perlmutter and 
Montmollin (48) experimented 
with group vs. individual learning of 


ae 


nonsense syllables. Twenty groups of 
three persons each were required to 
learn equivalent of 
words. One list was learned by each 
individual while working alone, but in 
the presence ol the other two. A sec- 
ond list was learned as a cooperative 
three-person project. Half of the Ss 
worked first as individuals and half 
first members of interacting 
groups. On all trials the average 
group recalled more words correctly 
than did the average individual. The 
group recall tended'to be equal to or 
better than the best individual score, 
and those who worked first as mem- 
bers of interacting groups tended to 
do better as individuals than those 
who worked first as individuals. The 
converse was not found to be true. In 
agreement with Shaw (55), Perlmut- 
ter and de Montmollin noted proc- 


lists nonsense 


as 
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esses of rejection and evaluation op- 
erating within the groups, and the re- 
sults seemed to indicate that groups 
adopted fewer invented words and 
fewer words represented modifica- 
tions of those in the lists. 

In 1953, Perlmutter (47) tested 
group vs. individual memory of 
“meaningful” material. A story en- 
titled ‘“‘War of the Ghosts” was read 
to 8 two-man groups, 8 three-man 
groups, 3 four-man groups, and 10 
individuals. A comparison of recall 
was made after 15 minutes and 
after 24 hours. No statistically sig- 
nificant differences were found, al- 
though the results favored the 
groups. The standard deviations of 
individual’s scores were nearly twice 
those of the three-man groups, indi- 
cating the possible existence of a 
group pressure toward conformity. 
Individuals required less time than 
both two- or three-man groups at a 
statistically significant level in both 
Perlmutter concluded from 
this experiment that, on the one 
hand, hardly any evidence was found 
to support the extreme position that 
the content of group memory product 
is unique and not related to the con- 
tent of individual member recalls. 
Very little correct information was 
found in group recall that was not in 
any member's recall. Conversely, 
some correct content was found in all 
or some of the individual’s recall that 
was not found in the group memory 
product. He concluded that while it 
was interesting to attempt a deriva- 
tion of group product from _ indi- 
vidual products, in some respects 
group product can be treated in its 
own right, and that some principles 
of product change can be formulated 
without measurement of individual 
member memory. 

The single generalization derived 
from these studies is that in conform- 
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ity with other studies reported in this 
review, evidence of the existence of 
both a depressing and an accelerating 
effect from group participation is evi- 
dent. These experiments do not aid 
in identifying or quantifying these 
effects. 


SIZE OF GROUP 


A section on group size is included 
because of its profound effect upon 
group productivity. ““Group” in con- 
trast to “individual” is affected by a 
number of important variables, size 
being one of the most -important. 
This is true to such a great extent, 
that the term “group” can refer to 
materially different entities. It is im- 
portant that knowledge of the vari- 
ability of a group product, as condi- 
tions vary, be utilized when compar- 
ing “group” and “individual” prod- 
ucts. 

In 1927, South (60) conducted an 
experiment with 1,312 Ss divided into 
groups of three and of six. Four types 
of tasks were assigned ranging from 
the ‘“‘concrete’’ to the more ‘“‘ab- 
stract."" The tasks were: judging 
emotion from a series of photographs 
portraying emotion (abstract); an- 
swering multiple choice questions 
(concrete); solving bridge problems 
(concrete); and judging English com- 
positions (abstract). The results 
were obtained on the accutacy of the 
performance, and on the time re- 
quired to complete the experiment. 
The results indicated that the size of 
the group affects its efficiency. In 
each of the four types of material 
there was a difference between the 
performance of groups of three and 
groups of six, depending somewhat on 
the type of material or the kind of 
problem given the group. The small 
groups were more efficient with ‘“‘ab- 
stract’”’ problems, while the larger 
groups did better with the “‘concrete”’ 
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problems. South concluded that in 
the case of the abstract materials, the 
members had their own opinions 
after the first glance and the commit- 
tee’s task was largely that of com- 
promising and overriding opinion. 
In the case of the smaller group there 
were fewer opinions and hence less to 
do. In the case of this particular type 
ot problem, the small group 
faster with no loss in accuracy. 

Kelly and Thibaut (30) reported a 
study by Bales et al. (4) in which in- 
dividual Ss were first ordered ac- 
cording to initiation rank, which is 
the degree to which they initiated re- 
sponses in a group situation. The- 
oretical curves based on a harmonic 
distribution were then fitted to the 
obtained percentages of total acts 
contributed by members at each 
ranked position. For groups of size 
three and four the empirical curves 
were found to be flatter than the 
theoretical curves, but for groups of 
size five through eight the empirical 
curves were steeper than the theoreti- 
cal curves. Thus it appears that the 
proportion of very infrequent con- 
tributors to the group interaction in- 
In the 
larger groups the discrepancy be- 
tween obtained and expected fre- 
quencies was attributed to the large 
volume of participation by the high- 
est initiator. This study suggested 
that as size increases from three to 
seven there is a sharp rise in the pro- 
portion of members who contribute 
less than would be expected if each 
member shared equally in the inter- 
action. Beyond the size of seven, the 
proportion shows no consistent in- 
crease or decline. 

In 1951, Gibb (19) experimented 
with the effects of group size upon 
idea production in a group problem 
solving situation. The Ss were 1,152 
college students composed into groups 


was 


creases as the size increases. 
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of 1, 2, 3, 6, 12, 24, 48, and 96. The 
groups were asked to produce as 
many solutions as possible to a series 
of problems permitting multiple solu- 
tions. Each group session lasted 30 
minutes. The results indicated that 
the number of ideas produced in- 
creased in a negatively accelerating 
function of size of group in each of 
the two conditions. Valid criticism 
of this experiment is that the time 
limit of 30 minutes was not sufficient 
to permit an exhaustion of the po- 
tential contributions of all of the 
members of the larger sized groups. 
Furthermore, the problems may not 
lend themselves to more than a lim- 
ited number of solutions. However, 
of extreme importance is that Gibb 
reported that with increasing size a 
steadily increasing proportion of the 
groups’ members reported a feeling 
of threat or inhibition of their im- 
pulses to participate. This, in addi- 
tion to the statistical results, sup- 
ports the hypothesis of a restraining 
force resulting from increased size of 
groups. 

Carter et al. reported a study (10) 
comparing individual participation in 
groups of varying sizes. They con- 
cluded that in groups of four, indi- 
viduals have sufficient space in which 
to behave, and thus the basic abilities 
of each individual can be expressed, 
but in the larger groups only the 
more forceful individuals were able 
to express their abilities and ideas, 
since the amount of freedom in the 
situation was not sufficient to ac- 
commodate all the group members. 

From this limited number of 
studies certain tentative generaliza- 
tions can be made. As indicated by 
South, greater production on “ab- 
stract’’ problems can be expected 
from smaller groups than from larger 
ones, and greater production on ‘‘con- 
crete’? problems from larger groups 
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than from smaller ones. Bales et al. 
(4), Gibb (19), and Carter et al. (10) 
indicated the possibility that groups 
of increasing size will increase pro- 
duction at a negatively accelerating 
rate for problems of certain types. 
When comparing production of 
groups of varying size and individ- 
uals, these generalizations should be 
kept in mind. Considerable addi- 


tional research is needed to confirm 
or refute these expectations. 


PROBLEM SOLVING IN MORE 
REALISTIC SETTINGS 

Studies using genuine and signifi- 
cant situations are numerous 
than those involving judging, learn- 
ing, etc., partly because concern with 
the more genuine human relations 
problems has emerged quite recently 
and partly because of the practical 
difficulties in working experimentally 
with problems involving decisions. 
These decisions require the indi- 
vidual or the group to weight alterna- 
tives for relative adequacy, followed 
by the selection of one or some com- 
bination of several as the most feasi- 
ble solution rather than determina- 
tion of the correct answer. Thus, the 
criterion for appraising decisions in 
these experimental studies should 
differ from agreement with the one 
true order or the one correct answer; 
rather, the evaluation of decision, 
ultimately, should be based on some 
system of credits for coverage and 
adequacy. 

Timmons (67) used as criterion the 
experts’ rank-order of five possible 
options to the genuine problem, 
“What type of parole system should 
Ohio adopt?” His research was ori- 
ented primarily to estimate the effect 
of discussion on the individual's rank- 
ing of the five options. The Ss were 
high school students in Ohio. Classes 
were divided so that some Ss worked 
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as individuals throughout the experi- 
ment and others worked in specially 
constituted groups during part of the 
experiment. The controls (as indi- 
viduals throughout) 1: on Day 1, 
ranked the five different options and 
took an attitude scale toward parole; 
2: on Day 2, read a pamphlet’ con- 
taining authoritative 
about parole, then again took the 
attitude scale and ranked the 
tions; 3: on Day 3, reread the in- 
formation pamphlet under motiva- 
tion of competing with groups dis- 
cussing the problem, and then again 
took the attitude scale and ranked 
the options; and 4: after an interval 
of a month, were measured for atti- 
tude and for ranking of options. 

The experimental was 
treated identically for Steps 1, 2, and 
4. The essential dilierence was in 
Step 3 in which six different kinds of 
groups were formed, based on the 
performance on the first day. Each 
group was supplied with a copy ol 
the informational pamphlet, dis- 
cussed the problem in 
groups and formulated a ranking of 
the options as a group. When that 
had been completed, each member 
of each group took the attitude scale 
and ranked the options as_ indi- 
viduals. 

Timmons’ measure was the indi- 
vidual’s ranking of options, so much 
so that Timmons considered ranking 
by groups only incidentally and 
tangentially. In terms of the indi- 
vidual’s agreement with expert rank- 
ing, the informational pamphlet pro- 
duced a tremendous shift toward ex- 
perts’ rank (Day 2 minus Day 1). 
Individual study and group discus- 
sion resulted in further movement 
toward expert ranking (Day 3 minus 
Day 2). The individuals who par- 
ticipated in group discussion were 
closer to the experts than the indi- 
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viduals who restudied the pamphlet. 
These changes were maintained, in 
general, a month later. For attitudes, 
gains only followed the reading of 
the pamphlet on Day 2 and at no 
other time, and showed no difference 
at any time between those who dis- 
cussed and those who restudied the 
pamphlet. 

In this major aspect, Timmons 
demonstrated a significant transfer 
from group discussion to subsequent 
individual rankings. Unfortunately, 
Timmons considered the group’s 
ranking a very minor aspect of his re- 
search. He reported that after dis- 
cussion the groups’ average agree- 
ment score was 2.93, which was not 
significantly different from the im- 
mediately subsequent individual 
(from those groups) average agree- 
ment score of 3.31. The 3.31, how- 


ever, Was significantly better than 
the 6.70 of the individuals who had 
restudied the pamphlet. 


Although Timmons formed six dif- 
ferent kinds of groups based on the 
amount of their agreement with ex- 
perts’ ranks initially, he failed to re- 
port the ranking of the various 
groups. Methodologically, the six 
groups were made up as: I. 4 Ss 
with good scores; II. 4 Ss with inter- 
mediate scores; III. 4 Ss with poor 
scores; IV. 2 Ss with good scores, 2 
Ss with poor scores; V. 2 Ss with 
good scores, 2 Ss with intermediate 
scores; and VI. 2 Ss with intermedi- 
ate scores, 2 Ss with poor scores. 

He reported, however, in terms of 
individual change that the gains were 
largest for the poor, smaller for the 
intermediate; and least for the good, 
student. The good made greater gains 
after discussing with other good than 
after discussing with the poor or the 
intermediate. The good did not get 
worse alter discussing with the so- 
called poor. The poor gained as much 
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from discussion with the gdod as 
from the intermediate, but always 
significantly more than from discus- 
sion with the poor. From the view- 
point of learning by individuals, all 
individuals seem to benefit from dis- 
cussion even when the discussants 
were relatively less adequate. 

In 1941, Robinson (51) investi- 
gated the effects of group discussion 
upon attitudes toward two social 
problems: capital punishment, and 
American policy to keep out of war. 
He contrasted college sophomores in 
43 ad hoc groups of from 8 to 20 
members as experimental samples, 
and 225 college sophomores as indi- 
viduals. The experimental sample 
(a) studied group discussion theory 
for one month and had weekly prac- 
tice discussions, (0) studied material 
on the problems, then, (c) took 
Thurstone Attitude Scales relevant 
to both problems, (d) had a two-hour 
discussion on each of the questions, 
and, finally, (e) took the attitude 
scales a second time. The control 
sample had no group discussion, but 
were given successively the two forms 
of the Thurstone Scales on each prob- 
lem. In one variation, an informa- 
tion test was added before and after 
discussion; in other variations, dis- 
cussion theory and the study of the 
informational material were omitted. 

Although significant changes in 
mean scores were made only in atti- 
tudes about how to keep out of war, 
Robinson noted. that a consideration 
of the magnitude of the attitude 
shifts by individuals revealed signifi- 
cant changes on both problems in all 
groups. When the informational test 
used, the individuals showed 
gains after discussion. However, 
without comparable data for the con- 
trol, this gain cannot be referenced 
to individual versus group superior- 
ity. Ina third experiment, Robinson, 


was 
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comparing change in attitude after 
reading informational material with 
that after 30-minute group discus- 
sion found that the magnitude of the 
shifts by individuals after reading 
not only exceeded those after 30- 
minute discussion, but also those 
after the two-hour discussions in the 
earlier experiments. This lack of 
shift after discussion, contrary to 
Timmons’ findings, may be attributa- 
ble to the inadequacy of experimental 
designs, or to a genuine difference be- 
tween the sequelae of reading or of 
discussion. 

Robert Thorndike (66) hypothe- 
sized that much of the superiority of 
groups over individuals was attribut- 
able to the elimination of (a) indi- 
vidual chance errors and (6) those 
errors differing from individual to in- 
dividual and from time to time. The 
second hypothesis had been con- 
firmed partly in the Gordon experi- 
ments. Thorndike attempted to iso- 


late that part of group superiority 
that was due to averaging or sum- 
ming individual contributions, from 
that part due to the elimination of 


each individual’s chance errors. He 
used 1,200 college students formed 
into 220 ad hoc groups of from four 
to six members. They worked on 30 
problems, e.g., selecting the better of 
two poems, the more socially signifi- 
cant of two headlines, etc. The de- 
sign required choice by an individual 
together with a measure of confi- 
dence; then, after discussion, a group 
choice. Both the individual and the 
subsequent group choice were com- 
pleted for each problem before the Ss 
proceeded to the next problem. 
There were significant differences be- 
tween the mean scores for all indi- 
viduals before discussion, for ‘‘con- 
cocted” groups before discussion, and 
for ad hoc groups after discussion. 
The analysis of the group product 
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revealed that part, but not all, of the 
difference between group and _ indi- 
vidual results can be explained as a 
consequence of the pooling of the in- 
dividual products. As Gurnee found 
in group judgments, the group prod- 
uct was more than an expression of 
the majority of the members compris- 
ing the group, and Thorndike found 
this ‘“‘more’’ attributable to the dis- 
cussion among the group’s members. 

The data also were analyzed for 
the consequences of grouping, i.e., the 
effects when the majority were cor- 
rect before discussing in the group, 
as opposed to when the majority were 
incorrect. When at least 70°% of the 
individuals were correct before dis- 
cussion, there was a gain after group 
discussion of 11°. When less than 
50% of the individuals were correct 
before discussion there was a 
after group discussion of 7°. This 
result indicated the 
qualifying Jenness’ earlier hypothesis 
that disagreement among members is 
more conducive for group improve- 
ment than is agreement. His hy- 
pothesis may be correct with judg- 
ments, where awareness that others 
do differ results in restudy, but may 
not apply in problem solving or deci- 
sion making when disagreement in- 
volves a majority with an erroneous 
view. Indeed, Farnsworth and Wil- 
liams (18) made the same point in 
their demonstration that when all 
group members were likely to be in 
error, there was no reason to expect 
the group product to be better than 
that of individuals. 

Timmons (68) concluded that after 
allowance for the averaging of indi- 
vidual contributions is made, a sig- 
nificant superiority for all groups still 
remains; similarly, after allowing for 
the effect of majority influence, an 
insignificant amount of superiority is 
reported. When allowances both for 
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averaging effects and for majority in- 
fluence are made, there is “‘a large, 
- but not significant” difference favor- 
ing the group. Timmons suggested 
that considering the rigor of his 
methods, “‘it seems probable that the 
differences are even more significant 
than they seem to be.”’ He suggested 
four factors possibly inherent in dis- 
cussion that may account for unex- 
plained differences: the group (a) has 
more suggestions leading toward the 
solution (cf. Shaw), (6) has a wider 
range of interpretations of the facts 
of the problem, (c) has a wider range 
of criticism and suggestion (cf. Shaw), 
and finally (d) has more information 
(cf. Lorge and Solomon). 

In 1955 Lorge et al. ¢39) experi- 
mented with the difference in the 
quality of solution to a _ practical 
problem which was presented in four 
settings differing in degree of remote- 
ness from reality. The problem was 
presented either as a verbal descrip- 
tion, a photographic representation, 
a miniature scale model, but not 
allowing manipulation of parts and 
materials; or a miniature scale model 
allowing manipulation of parts and 
materials. The problem consisted of 
finding a way to get a squad of sol- 
diers across a specified segment of 
mined road as quickly and secretly 
as possible using a limited number 
of available props. The problem was 
adaptable to many solutions of vary- 
ing quality, and had the character- 
istics of a genuine field situation. 
Ten teams of five AFROTC students 
and 10 individuals worked on the 
problem. Any individual or group 
had the right to ask as many ques- 
tions of the experimenter as he so 
desired, and the experimenter made 
an effort to answer all questions as 
long as they did not directly divulge 
a method of solution. The results 
indicated no significant differences 
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among the solutions at the four levels 
of remoteness from reality. However, 
at all levels the solutions of the 
groups were markedly superior to 
the solutions of the individuals. It 
was concluded that the differences 
may be due in large part to the 
amount and kind of information the 
Ss had at their command. It was 
noted that the number of questions 
asked of the examiner increased with 
the remoteness of the model from 
reality. This worked toward equal- 
ization of the information available 
to all groups. Groups asked more 
questions than the individuals at all 
levels of remoteness, which meant 
that the groups had more informa- 
tion to work with than did the indi- 
viduals, and may account, in part, 
for their superiority. 

Lorge et al. (36) completed a study 
in 1953 comparing the quality of 
group and individual solutions of 
human relations problems before 
and after class instruction in staff 
procedures. At the beginning of the 
course, one half of the Ss spent one 
period in problem solving as indi- 
viduals, while the other half worked 
as groups. In the very next period, 
those who had worked as individuals 
were formed into groups, while the 
earlier groups were dissolved and 
their members worked individually. 
The design was replicated six months 
later, at the termination of the 
course. The results indicated that as 
a result of this particular form of in- 
struction, the quality of decisions 
prepared by ad hoc staffs after. train- 
ing is significantly superior to that 
of those prepared by ad hoc staffs in 
the opening week of class. By con- 
trast, and of interest, the decisions 
written by individuals after instruc- 
tion are not significantly different 
from those they had prepared as in- 
dividuals in the opening days of the 
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course. This indicated the possibility 
of individuals being able to improve 
in their performance of a given task 
as members of a group, without show- 
ing improvement in their perform- 
ance of the same task as individuals. 

These results emphasize the fact 
that the group is not necessarily 
superior to the individual in human 
relations decisions. The quality of 
individual decisions before instruc- 
tion is significantly superior to that 
of groups. This difference in favor 
of individuals may indicate merely 
the relative ineffectiveness of ad hoc 
groups to solve the problem in the 
given time. In some of the appraisals 
it was found that groups, before in- 
struction, lost upwards of 80° of 
the ideas that their constituent 
members as individuals had for the 
solution of the problems. Many of 
these lost ideas were important. At 
the beginning of instruction, more 
than 75° of the individual decisions 


were superior in quality to the best 


of group decisions. Since the deci- 
sions of the individuals at the end of 
instruction do not differ in quality 
from those at the beginning, the pre- 
sumption was that there was a gain 
in group interaction but not in prob- 
lem-solving skills whether among in- 
dividuals or in groups. 

The data also indicated that the 
probability that any individual's idea 
will be expressed in the group deci- 
sion is a function of the commonality 
of the idea, i.e., of the number of in- 
dividuals who had the same idea 
prior to the group meeting. Of all 
the ideas that were held in common 
by two or more group members prior 
to the group meeting, half appeared 
in the group decision. Only 10% of 
the unique ideas, those possessed by 
only one person prior to the group 
meeting, ultimately appeared in the 
group decision. Similarly, only one 
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third of the ideas evolved in group 
decisions were original, i.e., ideas 
which none of the group members 
had mentioned in their earlier indi- 
vidual decisions, whereas two thirds 
of the ideas had already been so ex- 
pressed. These data suggested that 
group process not generate 
original ideas but relies heavily upon 
ideas formulated prior to the group 
meeting. 

A further aspect of the group's in- 
volvement with decisions was in the 
changes in food habits studied under 
the sponsorship of Lewin (35). The 
basic comparison was the carrying 
into actuality of made 
either in groups or as individuals. As 
such, the Lewin studies concern a 
side effect of the group process with- 
out any reference to the quality of 
the decision. 

One study was conducted with six 
groups made up of Red Cross volun- 
teers in a home nursing course. The 
objective of the course was to increase 
actual use of the unpopular ‘“‘variety 
meats,” e.g., beef hearts, kidneys, 
and sweetbreads. Lecture was con- 
trasted with group discussion in each 
of three groups to induce change, 
with the same nutritionist offering 
the same recipes. At the end of the 
discussion, the women, by a show of 
hands, indicated whether they in- 
tended to try the new foods. On fol- 
low-up, 3% of the lecture and 32% 
of the group-discussion volunteers 
used some of the “variety meats.” 
Lewin stated, however, that only 
subjects in discussion groups were 
told of the planned follow-up. 

In a study by Radke and Klisurich 
(50) six neighborhood groups com- 
posed of from six to nine housewives 
were organized with the objective to 
increase home consumption of whole 
and evaporated milk. The con- 
trasted procedures were lecture and 
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group discussion. After two and 
after four weeks, follow-ups indicated 
that those who had the discussions 
showed significantly greater change 
in the desired direction. As in the 
Lewin study, discussion groups were 
informed of the two-week follow-up 
but the lecture groups were not. 
Lewin, however, stated that neither 
sample knew about the four-week 
follow-up. The differences, on the 
four-week check, may be a conse- 
quent of successes with the new foods 
tried. 

Klisurich and Radke (50) tried to 
have new mothers increase the 
amount of orange juice and cod liver 
oil fed their babies. The “individual” 
condition involved each mother in- 
dividually discussing with the hos- 
pital nutritionist for about 25 min- 
utes the feeding of her new baby after 
which she was given printed instruc- 
tions on feeding. Both oral and writ- 
ten material stressed the importance 
of using orange juice and cod liver 
oil. The “group” involved other new 
mothers who were formed into ad hoc 
groups of six members for instruction 
and discussion of feeding. The time 
for a group of six was equivalent to 
that given any one individual, i.e., 
25 minutes. Follow-up was made 
after two and after four weeks. The 
results show that significantly more 
who had made decisions in groups 
behaved in the desired fashion. 

Lewin suggested that the first two 
experiments may be interpreted as 
the consequences of (a) greater in- 
volvement in the group situation as 
compared with the more passive 
audience role in the lecture, or (0) 
greater interest in the group discus- 
sion or (c) that only those in the dis- 
cussion groups knew of the antici- 
pated follow-up. The results of the 
third experiment were all the more 
striking because the individuals re- 
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ceived special attention, much more 
than any group members; and, fur- 
ther, because the group members 
were farm mothers who were un- 
acquainted with each other before 
and who had no subsequent contact 
after leaving the hospital. Neverthe- 
less, a 25-minute discussion among 
six such strangers produced much 
greater change than did a 25-minute 
consultation with an_ individual. 
Lewin considers the third experiment 
as indicative either of greater indi- 
vidual involvement in the group deci- 
sion, even under the described con- 
ditions, or that decision in groups 
tends toward action. 

Levine and Butler (34) replicated 
the Lewin experiment, attempting 
better controls, i.e., avoiding dif- 
ferential expectations. Their Ss were 
factory supervisors who regularly 
gave ratings which determined the 
base pay of workers in their depart- 
ments. It had been shown previously 
that the supervisors tended to rate 
the job rather than the man, ie., 
workers for the more highly-skilled 
jobs were consistently rated higher 
than those for the less skilled jobs. 
The experimenters educated the su- 
pervisors to rate man performance, 
not job level. The 29 supervisors 
were randomly divided into three 
groups. One group had no training, 
one a 90-minute discussion on im- 
proving rating, and the third a lec- 
ture followed by a question and 
answer period totaling 90 minutes. 

The control group did not change, 
rating the men on the more skilled 
jobs higher than the men on the less 
skilled jobs. ‘“‘The lecture method 
had practically no influence upon the 
discrepancies in rating.... Per- 
formance ratings were affected sig- 
nificantly only after the raters had 
had a group discussion and had 
reached a group decision."’ The data, 
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however, are not sufficiently rigorous 
to supply definite evidence for the 
superiority of the group technique. 

In 1950, Maier (41) compared 
groups and individuals in decisions 
with a realistic problem. The task 
was to plan the action to solve the 
problem of what to do with the slow- 
est man of a “parasol” or circular 
assembly line. The slowest men held 
up the whole assembly line. The 
problem was presented in two ways: 
(a2) just a statement of the problem o1 
(6) added to the statement of the 
problem was a description of the 
roles of the eight different members 
of the “‘parasol’’ assembly. 

The Ss solved the problem in ad 
hoc groups and as individuals. All of 
the groups were assigned discussion 
leaders, only some of whom were 
trained in techniques of group lead- 
ership; the rest were not. The trained 
leaders, however, knew the “elegant”’ 
or experimenter’s solution. 


Maier’s primary conclusion is that 
a trained leader can improve the 


quality of the group product. It is 
limited, of course, to situations where 
the trained leader knows the experi- 
menter’s solution. Under these con- 
ditions, Maier also discovered greater 
individual acceptance of the “‘ele- 
gant” solution. This illustrated the 
importance of appraising side effects, 
such as acceptance of a decision, as 
was done by Lewin (cf. 35). 
PRODUCTIVITY 

In industrial situations, people 
work in genuine life settings with real 
problems. Since the problems are so 
genuine, the participants are usually 
motivated, often highly motivated. 
An example of interaction to influ- 
ence productivity is Bavelas’ study 
(see 40, pp. 264-266) in which three 
groups of factory workers met with a 
psychologist to set a new production 
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goal for themselves. Their previous 
production “high” had been 75 units, 
their previous average was 60 units, 
and the goal they set was 84 units. 
This was achieved. Then, they met 
again and decided to set 95 units asa 
goal which they failed to realize, but 
production was stabilized at 87 units. 
Two other teams of workers serving 
as controls met with the psychologist 
but set no goal. They showed no sig- 
nificant variation from their previous 
average of 60 units. 

Coch and French (11) reported a 
study using as Ss factory workers 
with an average of eight years of 
schooling. They were investigating 
the effects on productivity of three 
degrees of participation in decision 
making. All groups had been produc- 
ing at 60 units before the job change. 
One group had no participation in 
making decisions about the job 
change—management gave them the 
reasons for the change and answered 
any questions they raised. Experi- 
mental Group I only partially par- 
ticipated in the decision making 
they elected a committee which 
made the decisions. Two other ex- 
perimental groups participated com- 
pletely, making all decisions as a 
group. The decisions pertained to 
design of the new job, determination 
of new rates, training methods, etc. 
The control group dropped, on the 
average, 10 units per hour, had an 
exceedingly high rate of turnover, 
and a slow rate of learning on the new 
job. 

Production in the experimental 
groups dropped initially but quickly 
rose to their old levels. In the com- 
plete participation group, produc- 
tivity reached a new “high,” 15% 
higher than previous production 
rates, the relearning rate was very 
rapid while turnover was practically 
nil. Coch and French concluded that 
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speed of regaining old production 
levels was directly proportional to 
the degree of participation in the de- 
cision to make the changes. That 
this was not just a function of the 
specific personnel involved is em- 


phasized when the control group be- 


came the experimental group for a 
later experiment, showing the same 
quick recovery and a new production 
high at the new job. 

Marrow (43), president of the 
company in which Coch and French 
conducted their studies, reported 
that union-instituted job changes 
with money bonuses had failed to 
change production rates or to get 
workers to accept job changes. When 
the program of group decision-mak- 
ing was instituted, a control group 
that was changed by the customary 
technique objected bitterly: 17% quit 
the job, and the rest showed little 
improvement. The experimental 
group, on the other hand, as noted 
earlier reached and then exceeded 
their old levels with none quitting. 
Marrow concluded, as did Coch and 
French, that participation is the key 
to success in group production. 

In 1952 Darley et al. (13) reported 
a study on group productivity only. 
The groups were residents of 13 
women’s cooperative housing units 
at the University of Minnesota. Each 
house (or group) contained 7 to 16 
students with its own president. The 
authors stressed that “efforts were 
made to create an in-group spirit that 
would characterize residents of the 
village and give them a feeling of be- 
longingness to the group.”” The stu- 
dents had been living together in the 
house for several months, long 
enough to be considered to have de- 
veloped a group “tradition.” The 
task was to prepare a “‘plan for better 
cooperative living in the village,” a 
first instance of a complex human 
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relations problem not only in a 
realistic but ‘‘real’’ situation of ac- 
tual concern to the Ss. Faculty 
judges ranked the reports for quality 
so that cash prizes could be given. In 
general, studies on productivity have 
advanced beyond the early use of 
the group as a social climate or as a 
means of inducing competition or 
cooperation. Recent work is moving 
toward work with people in real situa- 
tions making a decision basic to their 
everyday work. This, however, had 
involved a subtle change from par- 
ticipating in selecting alternatives for 
a decision, as Lewin's Ss did, to par- 
ticipating in working out the action 
plan which will give effect to a deci- 
sion already made, e.g., as it is in the 
Coch and French (11) study. 

Furthermore, productivity experi- 
ments have tended not so much to 
contrast the participation in a group 
setting with participation in the in- 
dividual setting, but, rather, to con- 
trast different amounts and kinds of 
participation where the fullest par- 
ticipation usually was in a group 
setting. More valid comparisons 
might have been made if each indi- 
vidual in the control group had been 
consulted, had had an opportunity 
to discuss the whole problem, and had 
made his wishes known. Then the 
comparison would have been between 
the effect on productivity of par- 
ticipation as individuals and as a 
group. 


MATHEMATICAL MODELS 


As important as the generaliza- 
tions from experimental data—two 


changes in methodology over the 
years and recent attempts to provide 
structure through mathematical 
models—are, it is a long step from 
Gordon's (22) statisticized group, or 
Allport’s (1) noninteracting face-to- 
face groups to Bavelas’ (see 40, pp. 
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264-266) ad hoc groups and Darley's 
(13) “‘traditioned” group. Many of 
the early experiments did not produce 
results relevant for the problem 
under investigation. The “statisti- 
cized"’ group gives little evidence 
about real groups in real situations. 
As has been indicated, if all the 
judges are from the same population, 
the group products are sheer reitera- 
tions of the value of the Spearman- 
Brown prophecy formula. 
Similarly, noninteracting groups 
shed more light on the significance of 
cooperation and competition, with 
and without an audience, than on 
the dynamics of grouping for the 
quality of the final product. Recent 


work with interacting ad hoc groups, 
and approximations toward the tra- 
ditioned group, points the direction 
for estimating the consequences of 
groups in solving problems in real 
situations. There is some hope that 


group dynamics will be understood 
more fully, a hope more evanescent 
when grouping was by computation, 
i.e., without interaction. 

A second advance is in the nature 
of the problem. The older less real 
tasks of ranking weights or estimat- 
ing intelligence from photographs is 
not too crucial a basis for estimating 
group superiority. Fortunately, the 
trend is toward human relations 
problems as exemplified by Maier’s 
parasol assembly line. 

Yet, despite the accumulation of 
evidence suggesting group superior- 
ity, the question of the efficiency of 
the group has not been considered 
too often. Thorndike has hinted 
that for some problems, the group 
may not be as efficient as the indi- 
vidual. Husband (28) gives evidence 
that in terms of man-hours per cor- 
rect solution, the group is not as 
efficient as the individual. The inef- 
ficiency in time cost is implicit in 
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Watson's first study which showed 
the summated individual superior to 
the group. There is a practical need 
to specify the co..ditions under which 
the group may be used with the great- 
est efficiency so as to channel group 
process to circumstances for which 
the consequences are commensurate 
with the time and manpower used. 
It is on the question of efficiency ol 
the group over the individual that 
mathematical models can at present 
play a very instructive role. 

In this report we are trying to 
focus on group productivity in con- 
trast to individual productivity. An- 
other active area of group research 
for the last 25 years has focussed on 
the study of group dynamics and 
group without regard to 
group productivity. Kelly and Thi- 
baut (30) take a strong position on 
this latter research activity and state 
that continuing research along these 
lines ‘‘is indicative of major inade- 
quacies in the research field,” and 
that the reason for continuing re- 
search of this type ‘‘may be suspected 
to lie in the absence of any good the- 
ory about either individual or group 
processes ..."’ (p. 780). While the 
senior author of this report recog- 
nizes the merit of this position, he has 
the feeling that perhaps a more ten- 
able position is a realization that both 
approaches have their legitimate 
place in this broad area of study, and 
are supplementary instead of con- 
tradictory. The one real, though un- 
fortunate, restriction that now exists 
is that simultaneous experimentation 
with social dynamics and processes 
and end products is highly difficult 
and usually unrewarding. This is 
probably due to the necessity for con- 
centration on the very basic concepts 
and minutiae of both areas. The 
eventual goal is to develop a sizeable 
body of knowledge in both areas and 
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perhaps hope for the convergence of 
this knowledge into one unified and 
consistent theory. An inherent recog- 
nition of this is also contained in 
Kelly and Thibaut’s article in their 
discussion of the requirements for an 
understanding of the social processes 
involved in problem solving. Here 
again the exploitation of mathemati- 
cal models might prove useful. 

The recent interest in mathemati- 
cal models is exemplified in papers 
by Lorge and Solomon (38) and by 
Hays and Bush (26). Lorge and Solo- 
mon provided two mathematical 
models to reproduce the Shaw data. 
In their mathematizations they es- 
sentially say that the observed better 
group performance can be explained 
simply on the individual 
ability or interaction of individual 
ability, a conclusion at variance with 
Shaw’s generalization that positive 
personal interaction is yielding the 
observed better group performance. 


basis of 


Hays and Bush use the Humphreys 
type learning experiment to mathe- 


matically performance in 
groups of three and in individuals. 
They consider two conceptualiza- 
tions: one where the group acts as an 
individual; the second where major- 
ity vote in each instance determines 
the group performance. These two 
conceptualizations were thought of 
as boundaries for group performance. 
An experiment was then run and the 
results obtained did lie between the 
two expected group performances. 
Obviously one importance of mathe- 
matization is that it provides-the de- 
sign for a next step in experimenta- 
tion which might never be revealed 
solely by examination of data. In the 
Lorge-Solomon situation some new 
experimentation was indicated by the 
results. In the Hays-Bush study new 
experimentation where none had been 
performed before was indicated by 
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their two models. An interesting ac- 
count of these two papers and several 
others not directly related to problem 
solving or learning is given by Cole- 
man (12) ina survey of mathematical 
models in small group behavior. 

In general, in the evaluation of the 
relative quality of the products pro- 
duced by groups in contrast to those 
produced by individuals, the group is 
superior. The superiority of the 
group, however, all too frequently, is 
not as great as would be expected 
from an interactional theory. In 
many studies, the product of the 
‘“‘best’’ individual is superior to that 
of the “best”’ group. It is quite prob- 
able that group solution may have its 
advantages in stimulating one an- 
other for, and in inducing cooperation 
for, a common solution. Yet, it must 
be recognized that group procedures 
may have disadvantages, too. <A 
single member, or a coalition of mem- 
bers, may retard the group by holding 
out for its kind of solution—a conse- 
quent that may reduce the quality of 
the group product if the solutions so 
proposed are inadequate or unreal- 
istic. 

Obviously, it would be valuable to 
study group processes to ascertain 
how members facilitate or inhibit the 
development of a group product. 
But, it is just as important to ascer- 
tain the quality of product developed 
by groups in contrast to those of indi- 
viduals. The researches reviewed in 
this survey are limited to the experi- 
mental assessment of the quality of 
group and of individual products. 
Currently, the research literature is 
in a terminological confusion, gen- 
erally, in that the group involved in 
the experimental studies is an. ad hoc 
group created by the experimenter 
for the experimenter’s purposes, but 
that the experimenter generalizes toa 
traditioned group which has organ- 
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ized for a common purpose and has 
been interacting over a considerable 
period of time. A similar conflict of 
terms is involved in the definition of 
the individual. Experimentally, the 
individual is a person selected at ran- 
dom from some population, but the 
interpretation suggests that even the 
best individuals cannot appraise or 
solve a problem in the round. Since 
some individuals can, the essential 
question is to determine how the 
efficient and able individual compares 
with the traditioned and able group. 

In the researches reviewed, more- 
over, the range of tasks varies from 
estimating (e.g., the number of beans 
in a bottle) to solving genuine prob- 
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Public concern is currently being 
expressed over commercial exploita- 
tion of subliminal perception. 


This 
use of subliminal perception is prob- 
ably related to the impetus given 
studies in unconscious perception by 
work in perceptual defense and sub- 
ception, a neologism from sudliminal 
perception. Two reviews of the 
literature have recently appeared. 
One includes experiments in sub- 
liminal perception (1), and the other 
in subception and perceptual de- 
fense (92). Both reviews systematize 
the literature, but neither pays suf- 
ficient attention to the relationship 
between the psychophysical methods 
employed and the data collected. No 
recent review or discussion (cf. 193) 
refers to current developments in 
psychophysics which are a major 
breakthrough in this area, and which 
reformulation of many 
concepts in perception. In contrast to 
preceding reviews, this review will 
concentrate systematically on psy- 
chophysical indicator methodology. 
The major point of this paper will be 
that much of the controversy in the 
area of unconscious perception is 
peripheral to the central perceptual 
problem of psychophysical proce- 
dures utilized. 


necessitate 


‘The research reported herein was _per- 
formed pursuant to a contract with the 
United States Office of Education, Depart- 
ment of Health, Education, and Welfare, and 
was made possible by grants provided by the 
National Science Foundation, to whom the 
author is greatly indebted for generous sup- 
port. 


The focus of this review will be 
experiments in which E seeks to ob- 
tain thresholds or related perceptual 
measures. Since the discussion will 
be concerned with methodology, con- 
centration will be on experimental 
investigations rather than on discus- 
sions or historical accounts; such 
intecedents may be found in other 
reviews (e.g., 32, 34). A further 
limitation is given in the title. This 
discussion will consider those experi- 
ments where the perceptual measures 
obtained are related to unconscious 
processes by £E or others in the field, 
either in affirmation or negation. 
Additional material will be presented 
from research in which this is not an 
issue, but which is relevant to the 
discussion. A later paper will extend 
the methodological analysis to per- 
tinent areas of perception not covered. 

No position will be taken on the 
issue of defining perception as: (a) a 
concept which relates certain sets of 
operations and logical relations, (6) a 
sensation related to physiological 
variables and often implying a sub- 
jective experience, (c) a subjective 
experience. This review, however, 
will be concerned with all three defini- 
tions since it will consider systemat- 
ically what is procedurally common 
to all definitions, namely, a response 
indicator. 

By indicator (14) is meant that 
class of responses which contributes 
the response ingredient (often called 
the dependent variable) to the defini- 
tion of perception which E accepts. 
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In addition to specified sets of re- 
sponses or operations (cf. 67), per- 
ception experiments are also defined 
by certain experimental procedures, 
logical relations, historical anteced- 
All of these will be subsumed 
under methodology. Perception can 
be considered simply as a concept (or 
schema)? methodologically defined in 
reference to O, and measured by the 
indicator. Use of the indicator may 
be extended to measure some sensa- 
tion or process which may imply a 
subjective experience, or to measure 
some state of mind, such as aware- 
Although verbally the mest 
cumbersome, the schematic usage of 
the indicator is scientifically the 
simplest, since, 1.0 matter what ex- 
tensions are made, the adequacy of 
the response to indicate perception 
will depend upon the methodological 
adequacy of the experiments in- 
volved. If, for example, E requires S 
to press a button every time a light 
goes on, and the button requires 25 


ents. 


ness. 


pounds of pressure, the response will 
The E should 
not then be expected to report either 
that perception (concept) had been 
attenuated, or that visual sensation 
had diminished, or that awareness ot 


not continue for long. 


the stimulus had fallen off. There 
would be agreement among all de- 
scribers that the procedures em- 
ployed had invalidated the indicator, 
and for precisely the same reason that 
a test is invalidated: the indicator is 
admitting variance from extraneous 
sources. In the button case cited, it 

? Actually, the term construct, alone or in 
combination as construct-perception, is pref- 
erable to concept or schema on a variety of 
crounds, among them etymological, and con- 
veys the meaning intended. Unfortunately, 
its meaning has tended to become restricted, 
among ps} hologists, at least, to indicate the 
"sage implied in the classical differentiation 
between hypothetical construct and intervening 


LI 
varidole, 
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is a historically demonstrated tatigue 
variable which is the extraneous 
source. Had this variable not been 
previously isolated, there might be 
argument concerning perception. In 
other cases, the indicator might be 
invalidated through the use of logic 
or Operations not the 
methodological definition. Hence, 
the methodology which defines per- 
ception minimally is basic to all def- 
initions and inferences having added 
meanings as well. 


covered by 


1. INDICATORS OF PERCEPTION 
THE SUBLIMINAL PERCEPTION 
PARADIGM 


AND 


In contrast to the usual perception 
experiment where only one indicator 
is investigated, all experiments in 
subliminal perception deal with dif- 
ferences between two indicators of 
perception, at related stimulus meas- 
ures. 

Of the two indicators of perception 
involved in subliminal perception, 
one is interpreted as an indicator of 
absence of awareness, that is, absence 
of awareness of the perception of a 
stimulus. The other indicator is 
interpreted to indicate perception or 
stimulus discrimination of the same 
stimulus. Subliminal perception, 
discrimination without awareness, 
and the subliminal effect (the term 
to be used in this discussion), refer 
to the presence of both indicators at 
the same stimulus magnitude. The 
indicators can be separated in time. 
For example, a stimulus magnitude is 
found which relates to reports of no 
awareness; at this magnitude or lower 
S later correctly identifies geometric 
forms. This difference will be called 
the discrepancy effect. The indi- 
cators can be contemporaneous. For 
example, at a given stimulus magni- 
tude S reports no awareness and also 
makes a correct identification. This 
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will be called the asynchrony effect. 

The foregoing constitute the para- 
digms for subliminal perception ex- 
periments. All involve the first in- 
dicator discussed. The second in- 
dicator varies, and forms the basis 
for the classification to be used in 
this discussion. 

The category with the greatest 
number of studies involves an ac- 
curacy indicator. The S is scored for 
correctly identifying a geometric 
figure, letter, number, and so on. 
Asynchrony effects were obtained by 
Baker (5) using judgments of rota- 
tion of visual figures, by Collier (32) 
for visual and tactile forms, by 
Coover (33) for letters and numbers, 
by Landis and Vinacke (102) for 
colors, by Peirce and Jastrow (132) 
decrease in tactile 
pressure, by Vinacke (186) for colors, 
and by Williams (194) for geometric 
forms. Discrepancy effects were ob- 
tained by Binet (8) using letters, 


for increase or 


and also by Coover (34), by King, 
Landis, and Zubin (94) using geo- 
metric forms, by Miller (121) using 
E.S.P. forms, and other visual forms 


(122), by Pillai (133) for letters 
voiced and shown, by Sidis (155) for 
letters and numbers whispered and 
shown, and_ similarly by Stroh, 
Shaw, and Washburn (173). 

In contrast to the foregoing stud- 
ies, which report subliminal effects, 
conflicting results have characterized 
use of an illusion indicator. Dunlap 
(46) interpreted results to faint ar- 
rows of the Miiller-Lyer illusion as 
indicating no awareness, and ob- 
tained size judgments in accord with 
the illusion. Titchener and Pyle (182) 
could not confirm this, and were sup- 
ported by .Manro and Washburn 
(118), 8 of whose Ss were unsuccess- 
ful, while 2 did obtain an illusion. 
Hollingworth (81) into very 
little detail concerning the method 


goes 


used by a student to get illusions in 
17 of 20 Ss. Bressler reported the 
illusion “increases gradually as the 
strength of the stimulus increases” 
(20, p. 250). 

A tone previously reported un- 
heard is shut off; this change is then 
reported by S in experiments by 
Dunlap (47), Dunlap’and Wells (48), 
and Jastrow (91). Colored after- 
images to color reported unseen were 
obtained by Fernald (59), challenged 
by Titchener and Pyle (183), and re- 
affirmed by Ferree and Rand ‘‘under 
the right conditions” (60, p. 196). 
These results were again obtained 
by the same authors (61), and by 
Newhall and Dodge, who found that 
“thresholds for after-images were 
lower than for stimulus color’ (128, 
p. 8). De Laski (43) found thresholds 
for cutaneous form were lower than 
the two-point threshold, as did 
Friedline (66). 

In “conditioning without aware- 
ness,’’ responses to visual or auditory 
stimuli are interpreted to indicate no 
awareness, and a conditioned stimulus 
is presented at this magnitude or 
lower. The unconditioned stimulus is 
supraliminal. Baker (6) reported 
pupillary conditioning to a_ sub- 
liminal tone, the unconditioned stim- 
ulus being light. These results could 
not be replicated by Hilgard, Miller, 
and Ohlson (77), nor by Wedell, 
Taylor, and Skolnick (188). Indeed, 
Steckle (169) and Steckle and Ren- 
shaw (170) had indicated difficulties 
in assaying even the supraliminal 
conditioning of this reflex previously 
reported by Cason (25), and Hudgins 
(88). A recent (1956) failure to ob- 
tain supraliminal pupillary condition- 
ing is reported by Crasilneck and 
McCranie (38). Wilcott (192) has 
also reported negative results with 
subliminal stimuli, as have Cason and 


Katcher (26). Taylor (178) has re- 
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ported positive results. Subception 
studies involve different procedures 
and will be discussed separately. 
Adams (1) presents a chart listing 
the various experiments with a posi- 
tive sign or a negative sign for con- 
( pro and contra the sub- 
liminal effect. This recalls an earlier 
summary which reported a score of 
hree to one against Titchener and 
Pyle, with one tie’ (121, p. 564), and 
implies that scientific issues can be 
settled by nose-counts, with 10 poorly 
controlled studies reaching one con- 
clusion outvoting a well-designed dis- 
senting vote. The studies discussed 
vary widely in methods and controls. 
Binet (8) apparently bases his non- 
awareness indicator on one descend- 
ing series per S. A hysteric walks 
backward until he reports he can no 
longer read the largest letter on a 
chart; he then writes smaller letters. 
The Titchener and Pyle (182) rep- 
etition of Dunlap’s study (46) pro- 
vided controls such as having the 
variable on both sides, instead of one, 
light adaptation, practice, better 
equipment, and better analytic pro- 
cedures. These two studies, at least, 
cannot be equated. ‘‘Whispers’’ are 
used (155, 173), arrows are pencilled- 
in (118), there are time-lapses be- 
tween each of the two indicators used 
(94, 121, 122, 133, 155, 173). A more 
detailed discussion of some individual 
experiments is presented by Wilcott 
(193). Full psychophysical functions 
are seldom obtained. On the other 
hand, Davis’ study (42) involving 
muscle action potentials to subliminal 
auditory stimulation is straightfor- 
ward, as are many of the others cited. 


1rUstons 


Semantic and Accuracy Indicators 


Use of one of the responses in sub- 
liminal perception experiments to in- 
dicate awareness has historical prece- 
dent in early psychophysical studies 
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where Yes, No, I see are assumed to 
have stimulus-related experiential 
referents. These referents are con- 
sidered different from those attached 
to guessing a triangle correctly or 
where muscle action potential is re- 
lated to auditory stimulation. Be- 
cause of the historical application of 
such semantic referents to Yes-No 
and similar responses, this class of in- 
dicators will be called semantic in- 
dicators. 

Needless to say, the continued use 
of such indicators has not been predi- 
cated upon their correspondence to 
common sense semantics, but rather 
upon their methodological fit, that is, 
upon their long history of continued 
lawful relations with other variables 
under specified conditions. It has 
accordingly been argued that the se- 
mantic referents of the symbolic re- 
sponses which constitute this in- 
dicator are irrelevant to it, and that 
the considered 


responses can be 


simply as responses with properties 
such as frequency, rate, amplitude, 


latency, duration. Regarding this 
argument, it should be noted that the 
term, semantic indicator, is used here 
in a neutral manner. Use of this 
term should not be construed as im- 
plying a semantic referent for this in- 
dicator. It is the name of a class of re- 
sponses which can be so interpreted 
and which also need not be so inter- 
preted.* 


* A behaviorist position holds that emotions, 
drives, states, sensations can be defined in 
terms of differing sets of logical relations and 
empirical relations between observable 
changes in environment and organism. The 
name that is given to each of these often turns 
out to be the name of the experience involved 
(cf. 58), e.g., love, hunger, anxiety, vision. 
Using such a term (or the term semantic indi- 
cator) does not imply that tfe subjective ex- 
perience which is the common sense referent 
of the term is being discussed. Someone wish- 
ing to use the sets discussed to indicate an ex- 
perience may do so. But when he wishes to 
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The indicator paired with the 
semantic indicator varies in the stud- 
ies mentiored. Some of the miscel- 
laneous studies in cutaneous differ- 
ences, color phenomena, and audi- 
tory on-off phenomena have been 
cited in discussions of subliminal 
perception, but can be considered 
peripheral to the discussion since 
they can be interpreted to indicate 
that within the same modality, there 
is differential sensitivity to classes 
of stimuli. There are, for example, 
threshold differences for blue and 
yellow, motionless stimuli and those 
in motion, high and low pitch, and 
soon. The Miiller-Lyer illusion issue 
is by no means settled, and the con- 
ditioning studies would seem nega- 
tive. The main body of data sup- 
porting the subliminal effect comes 
from studies in “discrimination with- 


out awareness” (121), where the 


semantic indicator is coupled with an 
accuracy indicator. 


Since two dif- 
ferent indicators are involved, it is 
conceivable that differences in results 
may be related to systematic dif- 
ferences between the indicators them- 
selves, such as types of scores taken, 
response control, correction proced- 
ures, and other such differences 
which are unrelated to the sub- 
stantive issues raised. Accordingly, a 
systematic comparison follows. 
Response and semantic referent of re- 
sponse. Accuracy responses usually 
designate the stimulus in some way. 
They identify it (e.g., stimulus is 
Triangle, Circle, 5, E), locate it 


conduct a test, it is the set he must test, since 
this can be observed and agreed upon by 
others, but the experiences cannot. For the 
present discussion, if we can agree on proper 
tests of variables affecting the sets under dis- 
cussion, we can be tolerant to differences in- 
volving subjective or operational meanings of 
terms. 


spatialiy (e.g., stimulus Up or Down, 
break .in circle faces East) or tem- 
porally (e.g., stimulus was presented 
in first interval), count it (e.g., 4 
dots). Semantic responses usually do 
not have this stimulus-designation 
feature. Rather, they seem to have 
subjective referents such as Yes, 
Confident, Seen, or terms which 
stand for them (e.g., use 2 for Yes 
and 1 for No; underline once if Cer- 
tain and twice if Doubtful). This 
difference per se is not too useful for 
methodological classification. For 
example, Yes is an accuracy indicator 
when E presents a signal during one 
of m intervals, the other intervals 
containing only noise, and S must 
state Yes during one of the intervals, 
and is scored for congruence between 
‘Yes and signal. On the other hand, 
Down can be a semantic indicator. EF 
can present a column of alternating 
squares and circles, with colors also 
alternating. Using phi-phenomenon 
procedures, the colors move up, but 
the forms move down. Although S 
is forced to choose between move- 
ment Up or Down, if each of these is 
an equally valid mode of response 
(for a color-form categorization), this 
is not an accuracy indicator, despite 
the fact that discrimination is forced 
in terms of specified stimulus spatial 
relationships. The features which 
distinguish semantic and accuracy 
indicators and which are methodolo.- 
ically meaningful follow. 

1. Indicator score. The accuracy 
score is not S's responses, but the 
accuracy of these. Responses are 
converted to congruence between re- 
sponses and £’s answer-grid sheet. 
The semantic score utilizes a measure 
of the response itself. 

2. Control over score. Since con- 
gruence defines accuracy, E exerts 
considerable control over the score by 
his control of the answer sheet. This 
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is not the case in the semantic indica- 
tor. 
3. Correction of indicator score. 
(a) Chance congruence. The 
accuracy congruence discussed can 
occur by chance alone. Correction 
involves consideration of the number 
of alternatives on E’s sheet. The 
term, accurate, as used here, implies 
such a correction. There is no com- 
parable a priori method of correction 
for semantic indicators. The prob- 
ability of Yes in Yes-No is not 1/2. 
(4; Response bias: general and 
accuracy. Bias is used here in the 
sampling sense, namely, that not all 
responses will have an equal prob- 
ability of appearing in the response 
sample. It might be said that a 
psychophysical experiment is con- 
cerned with response biases of an 
observer which are related to the 
stimulus. Irrelevant bias must be 
controlled. In accuracy indicators, 
two procedures are available. One 


involves presetting sequences on the 
answer sheet so that congruences ob- 


tained through response bias are 
balanced by incongruences so ob- 
tained. An S who tends to respond 
left will be spuriously accurate when 
the answer is left, but his score will 
be attenuated by answer sheet rights 
put in for this purpose by E. A sec- 
ond procedure involves analysis of 
the response record for this same 
purpose. 

(c) Positive semantic bias. Sim- 
ilar correction here is complicated by 
E’s comparative lack of control over 
the score. To control biases to re- 
spond Yes, catch stimuli such as 
blanks are often thrown in (the ref- 
erence here is to the use of blanks for 
score correction, rather than for dis- 
ciplinary correction of S when he 
says Yes to a blank). From false 
positives here (coupling of Yes and 
blank), a correction factor is drawn 
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for application to Yes responses 
whose “falsity’’ can not be readily 
determined, that is stating Yes when 
a stimulus is presented. If both ac- 
curacy and semantic responses are 
made to the same stimulus presenta- 
tion, false positives can be obtained 
without using catch stimuli, that is, 
when S trips himself by coupling Yes 
with an inaccurate response. The as- 
sumptions upon which these correc- 
tion procedures rest have recently 
been questioned. This issue will be 
discussed in the section on decision 
processes. 

(d) Negative response bias. If 
response bias can lead to false posi- 
tives, it must logically be capable of 
leading to false negatives. If correc- 
tion procedures are used for one, 
they should logically be used for 
the other. As a rule, no correction 
procedures are used for false nega- 
tives, and the existence of false nega- 
tives defines the subliminal perception 
effect: Ssays No and is accurate. The 
subliminal effect is accordingly con- 
taminated by a procedural and logical 
inconsistency. Further, this one- 
sided correction decreases the number 
of Yesses and can therefore in and of 
itself produce a discrepancy or asyn- 
chrony effect, unless there is corre- 
sponding correction of nega- 
‘ives. 


false 


‘For example, a task will be assumed in 
which accuracy by chance is highly improb- 
able (signal in 1 of 1,000 positions). The S 
says Yes 10 times and is accurate 7 times, and 
says No 10 times and locates signal accurately 
3 times. This gives the following small table: 

S's location was: 
Inaccurate Accurate 
. Yes 3 7 
S reported: ‘ pe 
- No 7 3 

One way of interpreting this table is to state 
the S’s responses are quite valid, but he is just 
as likely to ‘‘misuse’’ Yes as he is No. Another 
interpretation involves the establishment of a 
criterion for reporting Yes such that for inputs 
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4. Decision avoidance. Although 
in both indicators a response may be 
required, that is, forced, experimental 
conditions are often such that the 
negative semantic response can be 
used simply to avoid making a de- 
cision. Culturally, the negative re- 
sponse is preferred to the positive 
response for equivocal situations. 
This amounts to a cultural bias to- 
ward a response for which no correc- 
tion is made. 

5. Cross-experimental validation. 
The logic of the accuracy indicator 
can be independent of the responses 
used. For example, S is required to 
respond Triangle, Circle, or Square 
when one of these is presented. The 
better he can differentiate signal 
from background, and the better he 
can relate signal configurations to the 
response, then the greater the prob- 


difficult to discriminate, exclusion of false posi- 
tives and exclusion of true positives covary 
positively, with a high criterion leading to 
high exclusion rates (less Yes responses of 
each kind), and low criterion to low exclusion 
rates. Decision theory, to be discussed shortly 
is concerned with such decisions, and might 
interpret this data in terms of S maximizing 
response value by using a criterion producing 
3-7 ratios. 

A third way is to look at the accurate col- 
umn only and note that in regard to S's ac- 
curate locations of the signal, 30% of these 
occurred without awareness! This is then “ex- 
plained” by reference to unconscious processes 
in subliminal perception. 

Any correction for positive or chance bias 
of the type discussed would only i.« ~ -ase the 
subliminal effect. In this example, placing the 
signal in 1 of 1,000 possible positions elimi- 
nated, for practical purposes, the necessity 
for a chance correction. 

An ingenious dissertation by Zeitlin (197) 
uses as its central argument a similar treat- 
ment of perceptual defense, sensitization, and 
autism. Zeitlin argues that these effects are 
derivable from interpretations of the data 
which concentrate only on specified entries in 
a 2X2 table, and ignoring the total table. 
Discussion of this paper will be deferred until 
the section on these effects. 
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ability of congruence. This holds 
whether the responses are geometric 
forms, or letters, figures, and so on, 
although there are some major dif- 
ferences between certain procedures. 
Nuances in instructions will not 
usually affect this logic (although 
complexity of the task may make a 
difference). The semantic indicator, 
on the other hand, is considerably 
more dependent on the response and 
the instructions which often give it a 
referent. Some semantic responses 
used have been: great, slight, no con- 
fidence (5); stimulus, something, 
nothing (34); something, nothing 
(186); clearly, doubtful but some- 
thing, guess (194). Instructions vary 
from “Report when you can no 
longer see the figure’’ (94, p. 61), to 
“be very careful in the matter of 
giving immediate expression to even 
the slightest confidence” (5, p. 89). 
Systematic investigation of the ef- 
fects of such differences is usually not 
involved in the studies, and it be- 
comes dificult to compare data and 
to discuss semantic results obtained 
independently of specific responses 
and instructions utilized. 

The following sections concern 
systematic investigations of these 
two indicators. 


Semantic and Accuracy Indicators in 
Recent Psychophysical Research ; Valid- 
ity Differences 


Recent work in’ psychophysics 
bearing directly on the two indicators 
discussed may affect conceptualiza- 
tion and procedures in perception 
and mental test theory. 

The history of psychophysics can 
be interpreted as a search for methods 
and indicators which control inval- 
idating extraneous sources of vari- 
ance. The term, methods, usually 
brings to mind procedures such as 
the Constant Methods, Method of 
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Limits, Average Error, and so on. 
There has been considerable research 
in the procedural issues raised by 
these, but comparatively little sys- 
tematic research in the procedural 
issues raised by indicator differences.$ 

An outstanding exception is a 
monograph (14) and series of studies 
(11, 12, 13) by Blackwell. These 
report experimental comparison of 
semantic and accuracy indicators, 
called phenomenal report and forced- 
choice. The phenomenal report in- 
dicator is a semantic indicator stand- 
ardized to Yes-No. The forced- 
choice indicator is an accuracy in- 
dicator standardized so that the task 
involves differentiating a signal-plus- 
noise from noise alone. A light incre- 
ment is added to a uniformly illumi- 
nated screen, and appears in one of 
several positions (spatial forced- 


choice), or the increment is presented 
during one of several intervals (tem- 
poral forced-choice). 


The better S 
can differentiate signal-noise from 
noise, the greater the probability of 
the congruence called accuracy. This 
procedure is simpler than those where 
differentiating one geometric form 
from others is the task, since the 
latter may involve not only signal- 
noise differentiation, but at least 
patterning of signals, differentiating 
one pattern from another, and a wider 
array of responses. Blackwell's mono- 
graph reports systematicexperimental 
variation of the following variables 
related to indicators, methods, and 
extraneous variances (groups within 
each variable are given in paren- 
theses): indicator (Yes-No, forced- 
choice spatial, forced-choice tem- 


5 Notable among systematic studies on indi- 
cator methodology have been those concerned 
with the response in the P.S.E., and those 
studies which led to the elimination of middle 
categories such as the Doubtful in Yes, Doubt- 
ful, No. 


ISRAEL GOLDIAMOND 


poral), number of forced-choice al- 
ternatives [2, 4], stimulus magnitude 
orders (random, ascending, descend- 
ing, blocked), block size [1, 20, 40], 
number of magnitudes, threshold re- 
lations (below, above and below), 
motivational and set variables (e.g., 
variations in instructions and ques- 
tions; group which are “relaxed,” 
“‘motivated"’), knowledge of correct- 
ness. Catch stimuli were employed 
and varied systematically. The Ss 
were run for several sessions, with 
only one procedure used at a session. 

Comparing semantic and accuracy 
indicators, Blackwell found that the 
Yes-No indicator was greatly amen- 
able to influence from variables 
extraneous to sensory discrimination. 
If these constitute perceptual error 
variance, this indicator should, ac- 
cording to test theory, exhibit less 
session-to-session reliability, and less 
intrasession reliability than the 
forced-choice indicator. This is pre- 
cisely what was found; the “data 
suggest that forced choice rather than 
phenomenal report should be used in 
routine psychological measurement” 
(14, p. 118). If: this indicator is in- 
cluding extraneous variance, it should 
also yield higher thresholds than the 
forced-choice indicator, for the same 
reason that a voltmeter with steel 
filings in its bearings will be less 
sensitive than a clean voltmeter. It 
was reported by Blackwell ‘“‘that 
forced-choice thresholds were signif- 
icantly smaller than corresponding 
thresholds obtained with phenomenal 
report” (14, p. 199). The Yes-No in- 
dicator had considerably less ‘‘appar- 
ent validity”’ than forced-choice. 

The discrepancies obtained are en- 
tirely in keeping with the subliminal 
effect. It can be asserted that “‘a 
subject can discriminate intensities 
too low for him to be aware of them. 
This fact is probably best explained 
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in terms of levels of operation of the 
nervous system, some involving con- 
and others not—these 
levels having different thresholds” 
(123, p. 265). An alternative state- 
ment involves two indicators of per- 
ception, one freely admitting ex- 
traneous variance, and the other not. 

An experiment by Goldiamond 
(70) compares semantic and accuracy 
indicators for serial effect, an issue 
which has recently received consider- 
able attention. Serial effect refers to 
the covariation of responses in a 
series with preceding responses. Such 
psychophysical 
cordingly 


sciousness 


responses are ac- 
admitting variance ex- 
traneous to stimulus measurement. 
Goldiamond found that the prob- 
ability of saying Yes was not only a 
positive function of judgment inten- 
sity, but also of the stimulus intensity 
preceding the one at which judgment 
was made, that is, S was more likely 
to say Yes if the preceding stimulus 
had been high than if it had been low. 
Since high stimulus magnitudes are 
associated with high frequencies of 
responses made in their presence, the 
increase in Yes which followed might 
relate either to response effects or 
stimulus effects. A paired accuracy 
indicator did not covary with the 
same preceding stimulus magnitude, 
eliminating stimulus involvement, 
and assigning the effect to the re- 
sponse. Congruence wi.h E’s score 
sheet defines this indicator. The E's 
score sheet not having a serial ef- 
fect, any response bias of this kind 
would not enter into the indicator as 
a serial effect. 

These two indicators were further 
compared by the same author (69) 
whose Ss gave both phenomenal re- 
port and spatial forced-choice re- 
sponses to a triangle varied in posi- 
tion and intensity. The Ss were 
divided into 6 groups, with visibility 
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reported as‘(a) Yes-No, (b) 2-1-0, 
(c) 7-6- - - -0, and knowledge of ac- 
curacy (a) given immediately after 
each set of responses, and (6) not 
given. Only one session was run per 
S. Accuracy curves were similar for 
all groups; the corrected phenomenal 
report curves for each group in order 
of decreasing response frequency (and 
increasing subliminal effect) were: 
7-0 without knowledge (minimal 
asynchrony), 7-0 with, 210 without, 
210 with, Y-N without, Y-N with 
(35% asynchrony at threshold). This 
suggests that subliminal effects rang- 
ing from zero on up can be wilfully 
obtained by & through proper experi- 
mental manipulation. 

These data were interpreted in 
terms of stimulus generalization: an 
S trained to respond No to the ab- 
sence of a stimulus, S*, and Yes to 
its presence, S°, may upon the pres- 
entation of a faint but discriminable 
stimulus be more likely to respond 
No than Yes, since the faint stimulus 
is closer to S* than it is to S”. Per- 
mitting intermediate response would 
lead to more Yes responses. The 
depressant effects of knowledge of ac- 
curacy were related to elaboration of 
a 2X2 Utility Table (semantic in- 
dicator No-Yes by forced-choice lo- 
cation Inaccurate-Accurate). The 
combination Yes-Inaccurate (stim- 
ulus reported seen but in a location 
where it was not presented) attached 
aversive consequences (hallucination, 
poor judgment, lie) to responding Yes 
when S” and S®* were close (72).® 


* Another interpretation that deserves 
mention is the linguistic-experiential one. 
This states that language classifies our ex- 
periences, a discrete language such as ours 
doing so discretely. The S is told to label a 
bright stimulus experience with Yes, and the 
absence of such experience with No. He then 
gets a faint but discernable stimulus experi- 
ence. If he must classify this experience by 
some response, since it is closer to the experi- 
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A monograph by Smith and Wilson 
(159) reports variation of the dis- 
creteness of No (reported as (a) 
Didn't hear but guess, and (6) No) 
as well as of Yes (reported as (a) Cer- 
tain heard, and (6) Think). Such 
““‘liberal’’ Ss were contrasted to ‘‘con- 
servatives’ (Yes-No). Liberals gave 
more positive responses than con- 
servatives. Had their higher curves 
merely been an artifact of increased 
positive response bias, as their high 
false positive rate suggested, then cor- 
rection for false positives should have 
eliminated the superiority of this 
group, but “the data consistently 
failed, by a wide margin, to meet this 
assumption” (159, p. 34). Superior- 
ity in signal detection was related to 
increased false positives. To the usu- 
al model of the threshold, which de- 
picts the probability that S will 


receive information at a given pres- 


ence labeled No than it is to the Yes experi- 
ence, he may say No. But he will be able to 
locate the stimulus accurately since he is 
experiencing it. If intermediate categories of 
response are introduced, he may classify the 
faint experience with a 2 or 1, and the brighter 
ones with higher numbers. Eriksen (54) re- 
lates discreteness ir language to the subcep- 
tion effect; the study of language as an in- 
strument classifying experience has recently 
received considerable attention from anthro- 
pologists such as Whorf (190), who would hold 
that it alters experience, and Hoijer (80), who 
presents differing views. 

The three interpretations given here, de- 
cision theory, operant conditioning, and ex- 
periental-linguistic, are in agreement in that 
they differentiate perception from the re- 
sponse (cf. 67). Naturally, they differ on other 
issues, but on this issue they support the 
statement made earlier that if there is agree- 
ment on proper tests of variables, there can be 
tolerance to differences of added meanings. 
The statement by Solomon and Howes (164) 
that “any variable that is a general property 
of linguistic responses must also be a property 
of any perceptual concept that is based upon 
those responses” (p. 257) would be in similar 
agreement if altered to refer to perceptual re- 
sponse, rather than perceptual concept, which 
differs from it. 
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entation as related to distributions 
of signal-noise ratios, a third dimen- 
sion was added. This involved the 
superimposition of a distribution of 
cut-off points for reporting having re- 
ceived information. ‘‘Variations of 
the observer's attitude toward re- 
porting moves a cut-off line up or 
down the dimension of subjective in- 
tensity. fAny signal (or blank) ex- 
ceeding this cut-off point is reported”’ 
(159, p. 34), the authors conclude. 

Decision Theory and Signal Detection 

Work by Blackwell which system- 
atically investigates differences be- 
tween indicators, and related research 
in decision theory and signal detec- 
tion constitute a major breakthrough 
in psychophysics and perception. 
This breakthrough is both methodo- 
logical and theoretical, and bids to 
supply new applications of psycho- 
physical techniques as well as in- 
sights into new areas. This discus- 
sion will be concerned with extensions 
to unconscious perception. 

Decision theory is concerned, in 
part, with the cut-off issue raised by 
Smith & Wilson, and with differences 
between phenomenal report and 
forced-choice indicators. -It questions 
certain assumptions held in most 
psychophysical work and pertinent 
to subliminal perception. Among 
these are (a) the existence of “such 
a thing as a sensory threshold... 
(rather than) continuous reception of 
information” (175, p. 403), and, in 
this context, ‘‘that if some threshold 
of neural activity is exceeded, phe- 
nomenal seeing results’ (177, p. 402), 
and (b) that false positives represent 
guesses or are not related to signal 
detection. 

Although a distinguishing charac- 
teristic of the theory is its mathemat- 
ical nature, a verbal presentation, 
minimally necessary for understand- 
ing its relation to subliminal percep- 
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tion, will be made; Swets, Tanner, 
and Birdsall (174, p. 56) explicitly 
indicate a relationship. Although the 
presentation in this discussion is 
based on work with simple stimulus 
dimensions, Egan (176) has reported 
extension of the theory to experi- 
ments involving two-way communi- 
cation of speech. Substantive find- 
ings (that is, results of experimenta- 
tion in vision and hearing) using 
these formulations will not be dis- 
cussed. 

Most simply, the theory starts 
with the notion of a signal presented 
against a background of noise, for 
example, a tone, or a light increment 
against a uniformly illuminated 
screen. Output of (or sensitivity to) 
signal-plus-noise and noise alone will 
vary, each around a mean intended 
as their values by &. A distribution 
can be assigned to signal-noise, and 
to noise, with amount of overlap de- 
pendent on the variance and close- 
ness of the means. In this area of 
obvious overlap (and to lesser extent 
elsewhere), each of these produce 
similar inputs, and S can call a signal 
presentation noise, and a noise pres- 
entation signal. S's task is to decide 
if a given “observation is more repre- 
sentative of N (noise) or S+N (signal 
superimposed on background noise). 
His task is, in fact the testing of a 
statistical hypothesis’’ (177, p. 403). 
For Yes-No curves, he can “estab- 
lish a cut-off point such that any 
measure which exceeds that cut-off is 
in the criterion” (177, p. 403) and in 
so doing, engages in the risks at- 
tached to Type I and Type II sta- 
tistical decisions. If he sets his cut- 
off point low, he accepts more signals, 
but also rejects less false-positives. 
If he sets his cut-off point high, he 
accepts less false-positives, but re- 
jects more signals. 


Curves are drawn the 


relating 
probability of detecting signals (or- 
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dinate 0—1.00) to the probability of 
reporting signal presence when only 
noise was presented (abscissa 0-1.00). 
Where there is complete overlap of 
signal-plus-noise (SN) and noise (N) 
distributions, S will report Yes with 
equal probability for both SN and N; 
that is, if he says Yes 15% (or, say, 
80°) of the time, he will say Yes to 
this proportion of SN presentations 
and also of N presentations. The 
probability of saying Yes will be the 
same for both situations, producing a 
diagonal with a slope of 1.00. For a 
higher signal-noise ratio, that is, mak- 
ing the signal more intense, a con- 
cave curve running diagonally is 
drawn. An even greater signal-noise 
ratio will produce another curve above 
this, the parameter being such in- 
crease. These curves are called 
R.O.C. curves (receiver operating 
characteristic) and ideally represent 
the “best that can be done with the 
information available” (177, p. 104). 

It is explicit in the R.O.C. curves 
that as true positives increase so do 
false positives (called “false alarms’’). 
Any curve is for a given signal-noise 
ratio, and since it runs diagonally, 
increase in ordinate values is asso- 
ciated with increase in abscissa 
values. “A zero false alarm rate can 
be attained only at the sacrifice of 
zero detection rate. As the allowable 
false alarm rate increases, the in- 
crease in detection is first rapid and 
then slower, not a linear function of 
the false alarm rate,’’ states Birdsall 
(9, p. 397). Indeed, a “wild observer” 
willing to produce 28° false positives 
at threshold ‘“‘would have been found 
to have a threshold 6.4 db lower than 
the equally good but conservative 
(2%) observer” (p. 397) for an energy 
level cited. Caution raises the thresh- 
old. 

Forced-choice tasks involve no 
such criterion but rather “the largest 
of a set of likelihood ratios (a rela- 
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tionship between probability densi- 
ties of detection and false alarms) is 
chosen and this will be correct unless 
one of the likelihood ratios due to 
noise alone exceeded that likelihood 
ratio dye to signal-plus-noise’’ (9, 
p. 401). 

From the family of R.O.C. curves 
generated, the conventional psycho- 
physical curves can be derived. Rela- 
tion of the conventional threshold to 
false alarm probability suggests that 
“‘the threshold is strictly a monotonic 
function of the false alarm probabil- 
ity” (p. 397, italics supplied). 

In addition to giving the usual 
notion of a criterion, which involves 
choice of a particular signal-noise 
ratio with report of any input above 
it, Birdsall presents others: estab- 
lishing a permissible level of false 
positives and increasing positive re- 
port until that level is reached; re- 
porting in terms of veridical propor- 
tions of SN and N; reducing informa- 
tion (Shannon) uncertainty. Tanner 
and Swets (177) report experiments 
in which values and costs were an- 
nounced to S, who could get as much 
as $2.00 extra in a session. The Ss 
were given all the information neces- 
sary to compute optimization criteria 
and although they were mathemat- 
ically naive, they behaved ac- 
cording to decision theory, Swets and 
Tanner reported (176). The S acts, 
according to Tanner (175) not to 
optimize information, but to maximize 
expected value. Stated otherwise, S 
is maximizing the consequences of 
his responses in accordance with risks 
and profits attached, and with past 
learnings. The response is an operant 
and should not be confused with 
stimulus discrimination (cf. 67). 

The conversion of Yes-No curves 
to forced-choice curves, and the re- 
verse, are reported by Tanner and 
Swets (177) and Swets, Tanner, and 
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Birdsall (174). ‘“‘R.O.C. curves fit the 
Yes-No data, and forced-choice and 
Yes-No data are consistent with each 
other” (175, p. 406). This is a re- 
ported rationalization of the two 
main indicators of the subliminal ef- 
fect. This effect can involve the 
processing of information to give 
“high” accuracy curves while high 
cut-offs for signal detection produce 
“low” Yes curves, leading to a dis- 
crepancy or asynchrony between 
them; that is, S savs No and is ac- 
curate. These responses are operants 
governed by their consequences. 
Where the consequences are unfavor- 
able for Yes, S will produce a large 
subliminal effect. 

Regarding the presence of informa- 
tion in null semantic responses, the 
evidence for which has been pre- 
sented in the sections on develop- 
ments in indicator methodology and 
decision theory, Tanner considers 
that ‘‘the implications of such pres- 
ence are tremendous.”’ If such infor- 
mation is not lost, O is “capable of 
ordering the ‘below _ threshold’ 
events. This means that an observer 
receives more information than he is 
able to transmit when he has at his 
command only a one-bit channel” 
(175, p. 404), that is, only the two 
alternatives of Yes-No. 

Is there information in null scores 
of the accuracy indicator? Here there 
is typically either congruence be- 
tween S’s response and E£’s score 
sheet, or a null congruence between 
them. If S is either accurate or in- 
accurate, then giving him a second 
choice when he is wrong (on the first) 
should produce no further informa- 
tion. Stated otherwise, congruence at 
second choice should be what one 
would expect for pure guessing, that 
is, chance congruence. If in a four- 
choice situation, S is wrong on the 
first try, and then tries again, since 
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there are only three possibilities left, 
null information would be indicated 
by .33 correctness at this choice. 
Swets, Tanner, and Birdsall (174) re- 
port a proportion of .46 (significant 
at far beyond the .00001 level). First 
choices obtained without allowing 
for second choice, and first choices ob- 
tained with such procedures pro- 
duced almost identical accuracy for 
first choice (.650 and .651), indicating 
that information in the first choice 
was not thereby affected, and leading 
Tanner to conclude that “‘first choices 
are made up of a certain number of 
correct choices based on_ supra- 
threshold outputs, a certain number 
of incorrect choices based on supra- 
threshold outputs, plus some pure 
guesses’’ (175, p. 404). Yurther, “‘it 
is not unlikely that the third choice 
will convey information. Thus it is 
very difficult for an experimenter to 
determine when enough information 
has been extracted from forced 
choices’’ (174, p. 54). Accordingly, if 
“one calculates a threshold taking 
into account the correct second 
choices, he must arrive at a value far 
lower than any so far calculated, and 
at the same time he must realize that 
his result is still not as low as the 
true value. ...Such considerations 
suggest for all intents and purposes 
an arbitrarily small threshold, which 
is the same as saying that informa- 
tion is continuous,” (175, p. 404) 
Tanner concludes. 

The concept of the threshold ob- 
viously requires reexamination in the 
light of this research. The threshold 
has often been considered as a con- 
ceptual stimulus magnitude below 
which response does not occur, and 
above which it does. Locating such 
a point has often proved to be elusive, 
since the switch in response called for 
by this interpretation occurs at dif- 
fering stimulus magnitudes for the 
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same O. This has often been inter- 
preted in terms of errors of measure- 
ment or procedure, or in terms of 
fluctuations in the sensitivity of S, in 
stimulus energy, or both. A formula- 
tion which gets around this problem 
is to consider the threshold as a best 
estimate of this make-break point. 
The term is also used without such 
interpretation as a convenient, but 
seemingly arbitrary point of 50% re- 
sponse. Setting this convenient point 
at 50% is, however, not entirely ar- 
bitrary, since if fluctuations are nor- 
mally distributed around a mean, 
this mean can be the threshold, which 
can have mathematical significance as 
the inflection point of the integral of 
the normal curve. This curve is the 
psychophysical curve of the  phi- 
gamma hypothesis (cf 181). Con- 
ceivably, the possibility of such a 
mathematical referent explains much 
of the staying power of the term, 
threshold, in its current usage. 
Turning to the decision curves dis- 
cussed, no comparable mathematical 
significance can readily be attached 
to this point, nor is there a conceptual 
significance of a best estimate where 
information is considered a contin- 


uous precess. If we use as an example 
the anonymous ditty about the sad 
fate of Psychology, who first lost her 
soul, then her mind, then conscious- 
ness, being left with only her be- 
havior (of which the less said, the 


better), we might wonder where 


Limen is headed. 


Some Concluding Remarks on Sub- 
liminal Perception 

A methodological analysis of sub- 
liminal perception experiments in 
terms of psychophysical indicator 
methodology indicates that  sub- 
liminal perception experiments have 
as their paradigm a discrepancy or 
asynchrony between two indicators, 
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one being a semantic indicator, and 
the other varying. The greatest sup- 
port for the subliminal effect comes 
from experiments where the second 
indicator is an accuracy indicator. 
These two indicators were systemat- 
ically compared. One provides con- 
siderable control by the experimenter, 
the other does not. Whereas false 
positives are usually corrected for in 
standard psychophysical research, 
false negatives, in a logical incon- 
sistency, are not corrected for; their 
existence defines the subliminal ef- 
fect. This procedural inconsistency 
can of itself produce a subliminal 
effect. 

Recent research by Blackwell and 
others submitting the two indicators 
to systematic experimental compari- 
son suggests that semantic indicators 
tend to admit more variance ex- 
traneous to discrimination than do 
accuracy indicators. They are there- 
fore less sensitive and will produce 


less response, creating a subliminal 
effect as an artifact of this contamina- 


tion. Among invalidating variables 
discussed were consequences of re- 
sponse, serial effect, and categories of 
report. 

Concern with these two indicators 
has also been evident in experimenta- 
tion related to decision theory. This 
work questions many of the assump- 
tions of a threshold, upon which the 
notion of subliminal perception rests. 
Response in a Yes-No situation is 
viewed as the outcome of a decision 
process, with the S having available 
to him far more information than he 
reports. Rather than seeking to 
maximize information, he optimizes 
consequences of his response, in line 
with work on operant conditioning. 
Decision variables extraneous to 
sensory discrimination will influence 
his response. Analysis of null ac- 
curacy responses supports the con- 
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tention that accuracy is not an all- 
or-nothing phenomenon, and _ sug- 
gests that thresholds are far lower 
than hitherto considered. There is 
considerable evidence to suggest that 
the “below threshold” of subliminal 
is considerably above threshold. <A 
merchant who alters price-tags to re- 
port wholesale price at much higher 
than it is, can sell ‘‘below cost"’ and 
make considerable profit. 

In sum, methodological analysis 
suggests, to borrow a phrase from the 
political arena, that, regarding the 
subliminal effect, “there is less here 
than meets the eye.” 


Il. SuBCEPTION, PERCEPTUAL DE- 


FENSE, AND VIGILANCE 


The term subception first appeared 
in a report of an asynchrony obtained 
between two indicators (109), and is 
similar in this respect to the sub- 
liminal perception experiments dis- 
cussed. Lazarus and McCleary (105) 
employed visually presented non- 
sense syllables, some of which were 
conditioned to shock. These syl- 
lables produced GSR’s when S was 
reported unable to identify them cor- 
rectly. Subception was defined as ‘‘a 
process by which some kind of dis- 
crimination is made when the sub- 
ject is unable to make a correct 
conscious discrimination.”’ It will be 
noted that awareness was not in- 
ferred from a semantic indicator, as 
it is in subliminal perception, but 
from an accuracy indicator. This in- 
dicator is involved in the vast 
majority of subception, perceptual 
defense, and perceptual vigilance 
studies, and has indicator properties 
which are singular enough to merit a 
methodological analysis of its own, as 
a sub-branch of subliminal percep- 
tion. 

In general, these studies report a 
threshold stimulus magnitude for cor- 
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rect identification of a stimulus such 
as a word or a drawing. This thresh- 
old is related to other variables. In 
perceptual defense and_ vigilance 
studies, differences in thresholds are 
systematically related to different 
classes (examples are given in paren- 
theses) of stimuli (socially acceptable 
words, nonacceptable words), sub- 
jects (schizophrenics, depressives), 
states (stress, nonstress), experi- 
mental procedures (allowing com- 
pletion or noncompletion of tasks), 
or various interactions between these 
and other variables. In the subcep- 
tion studies, an autonomic indicator 
may be paired with this accuracy in- 
dicator, and an asynchrony is ob- 
tained (autonomic responses paired 
with null accuracy). This is related 
to variables of the type mentioned. 
Currently, there is controversy as 
to whether the differences obtained 
relate to perception in the experien- 
tial sense (cf. 16), to set (cf. 63), to 
learning processes (cf. 45), to statis- 
tical and other artifacts (cf. 21, 83). 
The battle lines drawn are between 
the experiential proponents and all 
others, with the former, so to speak, 
playing the field. ‘“‘What one sees, 
what one observes, is inevitably what 
one selects from a near infinitude of 
potential percepts,’ opens a_ key- 
noting experiential article. Percep- 
tion is “a first line of defense against 
would-be catastrophic situations and 
a sensitizer to adaptive opportuni- 
ties,”’ the discussion continues (140). 
The two processes mentioned in this 
quotation are the categories into 
which experimental results  con- 
sidered positive are placed. Defense 
is related to perceptual insensitivity 
inferred from high thresholds, and 
sensitization or vigilance is inferred 
from low thresholds. The psycho- 
physical complexities of the stimulus 
materials utilized, and differences in 
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procedures involved make it difficult 
to compare experiments in terms of 
differential thresholds, or to obtain 
norms. Accordingly, the thresholds 
are not high and low in reference to 
some standard, but in reference to 
another stimulus group or subject 
group run during the experiment. 
Wherever such a group needs to be 
specified, it will be placed in paren- 
theses, preceded by vs. Thus, the 
statement, ‘Defense effects were 
found for schizophrenics (vs. nor- 
mals),’’ should be read to mean that 
schizophrenics obtained higher 
thresholds for the stimuli of the ex- 
periment than did normals. 

In this discussion, the terms de- 


fense effect and sensitization effect will 


be used to refer to such high and low 
thresholds in their systematic rela- 
tionship to variables of the experi- 
ments. As in the case of subliminal 
effect, these terms are used for con- 
venience only and imply neither ac- 
ceptance nor rejection of any experi- 
ential inferences attached to them. 
As in the preceding section, this dis- 
cussion will not be concerned with 
the justifiability of attaching surplus 
meaning to concept-perception, this 
being a philosophic issue outside the 
scope of this discussion. Rather, the 
discussion will focus on psychophys- 
ical indicator methodology, which is 
minimally common to all definitions 
of perception. The schematic defini- 
tion is minimal since if the indicator 
is invalidated as an indicator of con- 
cept-perception, then it is also in- 
validated as an indicator of percep- 
tion defined in any surplus manner. 
The reverse does not necessarily hold. 
And, as in test theory, an indicator is 
invalidated by its admission of ex- 
traneous variance, in this case, by its 
correlation with variables historically 
considered extraneous to the defini- 
tion of perception. Studies will be 
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classified according to indicator and 
method used, and categorized 
cording to substantive issues. 


ac- 


Resulis from Studies Using the 
cending Method of Limits and 
dicator of Total Initial Accuracy 


As- 


In- 


Use of a specific indicator and 
specific psychophysical method is so 
widespread in this area as to make 
these almost generic for the studies 
under consideration. The method is 
the ascending Method of Limits, and 
the indicator of perception is one of 
initial total identification. An un- 
known stimulus configuration is first 
presented at a magnitude considered 
too low for identification. Stimulus 
magnitude is then increased in suc- 
cessive presentation until S identifies 
the configuration in a way considered 
total, that is, responds with the total 
word on £’s score sheet. The stimu- 
lus magnitude at which such identi- 
fication occurs, usually initially, is 
called the recognition threshold. 

The most common method for 
varying stimulus magnitude is 
through control of duration using a 
tachistoscope (3, 10, 23, 30, 39, 40, 
50, 51, 63, 64, 65, 73, 75, 86, 95, 96, 
98, 101, 112, 114, 116, 117, 119, 126, 
130, 134, 138, 139, 140, 141, 147, 
150, 151, 153, 163, 164, 165, 167, 
168, 179, 180, 184); indeed, a photo- 
graph of a tachistoscope appeared in 
an introductory psychology textbook 
issued during this period (76, p. 297). 
Variation of voltage for visual stimuli 
has been used in other experiments 
(53, 56, 68, 74, 106, 111, 136, 142, 
143, 144, 158, 165, 166, 171, 189), 
resistance in another (125), episcotis- 
ter angles have been varied (196). 
Acoustical equipment has been varied 
in voltage (15, 185), and the in- 
tensity of tape recording has been 
varied (97, 99). Some novel ascend- 
ing procedures have been carbon 
copies of typewritten words, from 


ISRAEL GOLDIAMOND 


the most smudged to most legible 
(7, 24, 35, 36, 37, 191), 
drawings ascending from a single line 
to a fully drawn banana inserted into 
a mouth (107), Gottschaldt figures, 
from most broken to most full (160) 
and progressive alteration of focus 
(62, 161). The discussion to follow 
concerns studies using this indicator- 
method and discussion related to the 
issues they raise. 

Reaction to two early studies 
initiated a pattern of controversy 
and accommodation which has char- 
acterized the field ever since. <A 
subception study by McGinnies (112) 
reported GSR’s upon presentation of 
taboo words (whore, penis) at stimu- 
lus magnitudes which produced null 
total accuracy, and also reported a 
defense effect for these words (vs. 
acceptable words). A study by Post- 
man, Bruner, and McGinnies (140) 
reported sensitization effects for 
words related to Ss’ values (vs. words 
unrelated), that is, Ss scoring high in 
economics on a value scale had low 
thresholds for economic words, with 
similar relations for other value 
groups. The experiential interpreta- 
tion made in terms of recognition and 
nonrecognition was challenged by 
Howes and Solomon (85, 86) who 
argued that an extraneous variable, 
word frequency, had been introduced, 
since the words had not been equated 
for frequency of usage (164). They 
also charged suppression of taboo 
words. McGinnies (113, 117) dis- 
agreed with the supression interpreta- 
tion and reported obtaining defense 
effects even when frequency was con- 
trolled. Equation for frequency by 
Postman and Schneider (146) in the 
values experiment eliminated the 
sensitization effect for words of high 
frequency, but not of low frequency. 

The following studies concern fre- 
quency as a methodological variable. 
Postman and Conger (142) reported 


anatomical 
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that for low frequency stimulus 
words S responded with words that 
were more frequent. and 
Vanderplas (15) reported more errors 
before high threshold words (vs. low 
threshold). Long words (vs. short 
words) were more affected by fre- 
quency, reported McGinnies, Comer, 
and (116). The amount of 
information in a word, rather than its 
length related to recognition 
thresholds by Miller, Bruner and 
Postman (120). DeLucia and Stag- 
ner (44) noted frequency effects, but 
also threshold dependency on vari- 
ables affecting homeostasis. Vander- 
plas (184) similarly found frequency 
effects but also Gestalt effects of the 
“organized character of the trace 
systems involved” (p. 582). Good- 
stein (73) found that an r=.53 be- 
tween unpleasantness and thresholds 
dropped to —.18 with frequency 
controlled. Newton (129) found more 
errors for unpleasant 
with such control. 
Diifering frequencies were ‘“‘built 
into” an experiment by Solomon and 
Postman (165). Nonsense syllables 
were presented at differing fre- 
quencies (from 1 to 25) during a 
training period; accuracy threshold 
was a function of such frequency, as 
it was in a similar experiment by 
King-Ellison and Jenkins (95) who 
obtained an r= —.99 between thresh- 
old and log frequency. Postman 
(134) varied training frequency and 
structural similarity. Frequency 
lowered thresholds for words of low 
similarity, but raised them for high 
similarity words. Postman and 
(144) varied training 
and recognition modalities (auditory 
and visual); there was facilitation 
when the same modalities were used, 
and for auditory thresholds when 
training was visual, but not the re- 
verse. Noble (131) reported that Ss 
rated as more familiar words which 


Blake 


Lacey 


Was 


words, even 


Rosenzweig 
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had been presented more frequently 
in a training period. Kristofferson 
(98) found recognition thresholds in- 
versely related to Noble’s m, which 
is contingent on frequency and asso- 
ciations. Frequency has also been 
related to intelligibility (cf. 84), 
Howes assigning 69°% of intelligibil- 
ity variance to frequency for words 
studied. 

Currently, the Thorndike-Lorge 
list is used as a control, as in an ex- 
periment by McGinnies and Sher- 
man (117). Postmanand Rosenzweig 
(145) report threshold frequency re- 
lations for French words. Davids (41) 
reported that frequency of associa- 
tion to words related to their per- 
sonal relevance, not to “frequency 
per se’’ (p. 335). Similarly, Daston 
(40) found sensitization 
words frequently given by Ss in 
therapy, giving him grounds for sup- 
port of idiosyncratic rather than 
“usage table” frequency. 

Defense effects have been found in 
the McGinnies studies cited (112, 
113, 117), by McGinnies and Ador- 
netto (114) for both schizophrenics 
and normals to words (vs. 
neutral), by Cowen and Beier (35, 
36) who used blurred carbon copies 
of sexual words (vs. neutral), and by 
Kleinman (97) using emotional words 
(vs.neutral) for the psychogenic deaf 
(vs. organic). Eriksen (52) found de- 
fense effects by paranoids against 
hostile words (vs. other categories). 
Spielberger (168) found thresholds 
dropped as the experiment progressed 
except where stutterers (vs. non- 
stutterers) responded verbally (vs. in 
writing) for stutter-arousing (vs. non- 
arousing) words; thresholds con- 
tinued high here, leading £& to assign 
this defense effect to response sup- 
pression. 

On the other hand, sensitization 
effects have been found to double- 
entendre words (fairy, pussy, balls, 


effects for 


sexual 
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screw) in a sexually connoting list 
(vs. nonsexual) by Wiener (191), by 
Kurland (99) for emotional words 
(vs. neutral) for hysterics and ob- 
sessives (vs. normals), by Lindner 
(107) for sexual pictures (vs. non- 
sexual) for sex offenders (vs. other 
criminals), and by Daston (39) for 
homosexual words (vs. others) for 
paranoids (vs. schizophrenics), in 
contrast to Eriksen (52) who did not. 
Chodorkoff (27) reported sensitiza- 
tion to threatening words (vs. neu- 
tral) by better-adjusted Ss in therapy 
who were ‘getting to know, as 
quickly as possible, what it is that is 
threatening” (p. 511). In another 
study (29), only absolute change 
(defunse or sensitization) was signif- 
icant. McClelland and Liberman 
(111) found neither support nor re- 
jection for hypothesized relations 
between thresholds and n Achieve- 
ment, but Eriksen (51) reported sen- 


sitization to eggressive pictures (vs. 


nonaggressive) by Ss giving aggres- 
sive T.A.T. Greenbaum (74) 
reported high anxiety Ss (vs. L.A.) 
displayed sensitization to _ hostile 
faces. 

The argument has been raised (cf. 
78, 87) that these effects may be ad 
hoc, since given two groups to be 
compared, if they are not equal, one 
must be higher than the other. Ac- 
cordingly, for any inequality, a sub- 
stantive result of defense or sensitiza- 
tion must be obtained. The strength 
of a hypothesis being the ease with 
which it can be disproven, the need- 
perception hypothesis may be jeop- 
ardized. Spence (166) and Chodor- 
koff (29) have argued that either 
outcome is substantive. Where de- 
fense and sensitization are not ex- 
planatory concepts, Postman (135) 
states, they are legitimate opposing 
principles; £, however, should “an- 
chor the concept in antecedent’ con- 


stories. 
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ditions (p. 299). Such an experiment 
is reported by Stein (171) who first 
classified Ss as sensitizers and 
fenders, and then obtained corre- 
sponding threshold -effects, as did 
Carpenter, Wiener, and Carpenter 
(24). The Ss diagnosed as inhibited 
displayed defense effects against sex- 
ual words, and reported response 
suppression, Kissin, Gottesfield, and 
Dickes (96) found. Cowen, Heilizer, 
Axelrod, and Alexander (37) found 
no relation between Taylor Anxiety 
scores and perceptual scores. 

The words used visually in the 
Postman, Bruner, and McGinnies 
(140) study of values were trans- 
scribed verbally for a tape recorder 
by Vanderplas and Blake (185) with 
similar sensitization effects. The Ss 
ranked traits; Haigh and Fiske (75) 
obtained defense effects for liked 
traits (vs. least liked). Rosenthal 
(152) noted differences in use of 
value related to religion, science as 
against likes, dislikes; the Allport- 
Vernon scale of values initially used 
had been considered as confounding 
interest and value by Adams and 
Brown (2). Traits named desirable 
also produced defense effects (vs. un- 
desirable), Postman and Leytham 
(143) reported. Negative results 
were obtained by Gilchrist, Lude- 
man, and Lysak (68) for Ss rated on 
an Anti-Semitism (high vs. 
low), with nouns projected (Jew vs. 
Ink), accompanied by adjectives 
(opprobrious, approving, neutral). 
Evidence of withholding (rather than 
defense against) derogatory words 
such as nigger could be elicited ‘‘only 
in unusual circumstances” (p. 733) 
such as Negro E£ running Negro S, 
Whittaker, Gilchrist, and Fischer 
(189) reported. 

McClelland and Atkinson (110) 
noted an increase in food responses 
as a function of deprivation. Wispé 


de- 


scale 
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(195) reported this relationship was 
nonlinear. Sensitization effects to 
food words (vs. neutral) for deprived 
Ss (vs. little deprivation) were re- 
ported by Wispé and Drambarean 
(196). Tavlor (180) failed to find 
such a relationship. 

Capital letters related to experi- 
mentally-induced failure (vs. non- 
failure) yielded defense effects for 
Postman and Brown (137). Stressed 
Ss (vs. nonstress) exhibited defense 
against Gottschaldt figures 
gradually made more complete by 
Sm ck ( 160 
tion was also displayed. Postman 
Bruner (138) reported that 
badgered Ss responded with aggres- 
Eriksen and 
Browne (56) related sensitization ef- 
fects for anagrams to learning prin- 
ciples rather than to perception. 
Both sensitization and defense effects 
were obtained by Postman and 
Solomon (147) in a Zeigarnik effect 
study for completed and uncompleted 
tasks. Eriksen (53) found that Ss 
who forgot failure-associated words 
also exhibited defense effects to them. 
Spence (167) reported sensitization 
to failure words. 


etlects 
response persevera- 


and 


sive and escape words. 


In the following studies, presenta- 
tion of a visible word, the standard, is 
followed by a word to be recognized 
using the ascending Method of Lim- 


its. Neutral recognition words pro- 
duced defense effects when the stand- 
ard was taboo (vs. acceptable), Mc- 
Ginnies and Sherman (117) reported, 
adducing this as evidence for a per- 


ceptual rather than response-sup- 
pression effect. High anxiety Ss (vs. 
low) exhibited when the 
standard was threatening (vs. non), 
Smock (163) reported. Cofer and 
Shepp (30) varied synonymity of 
standard and_ variable, reporting 
lowered thresholds for more syn- 
onymous (vs. words. Taylor 


defense 


less) 
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(179) obtained defense when Ss 
were not told of a relation between 
variable and standard (vs. informed). 

In a training session, Rigby and 
Rigby (151) associated token awards 
(5, 2, 0] and withdrawals [—3] with 
words. The threshold values of such 
words ranked themselves, lowest 
first, as 5, 2, —3, 0. Newton (130) 
supplied nickels {1, 0, —1]. There 
were defense effects against with- 
drawal words. 

Pronunciation of a given word 
terminated (150) re- 
ported sensitization effects (vs. non- 
termination word). 

Neisser (126) gave Ss a list of 
words which E said would appear. 
There was sensitization to 
words rather than to homonyms 
and unrelated words. hom- 
onyms involved the same responses 
as test words, ~ concluded that set 
“facilitates the perception of specific 
visual patterns” (p. 402), that is see- 
ing, not saying. Ross, Yarczower, 
and Williams (153) varied similarity 
of homonyms 


shock; Reece 


these 


Since 


(be-bee vs. phrase- 
frays), obtaining a nonlinear relation- 
ship to thresholds. 

Defense effects have been obtained 
when common situations have been 
rendered uncommon 
situations). Thus, Bruner and Post- 
man (23) presented playing cards 
with, for example, red spades. Post- 
man, Bruner, and Walk (141) re- 
versed letters. Smock (161) pre- 
sented figures half-man, half-woman. 
The Ss required to discriminate along 
one dimension had lower thresholds 
than in. situations where stimuli 
could belong in two categories, 
Postman and Bruner (139) reported. 
Similarly, Freeman and Engler (65) 
set one group for color words, an- 
other for food or color. Multiple set 
lowered thresholds for low frequency 
words, but raised them for high. 


(vs. expected 
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Engler and Freeman (50) set one 
group for animals, another not. An 
“impairment of recognition” of words 
not appropriate to the set was re- 
ported. Stein (172) reported that as 
exposure time of Rorschach cards in- 
creased, so did F+ responses. This 
procedure enabled study of “‘the per- 
ceiving process ...as it unfolds it- 
self’ (p. 356). 

Generally, where S has been given 
some foreknowledge (vs. no fore- 
knowledge) of words to be shown, 
defense effects have disappeared: for 
obscene (vs. acceptable) words used 
by Lacy, Lewinger and Adamson 
(101) and similarly for taboo (vs. 
acceptable) words presented by Post- 
man, Bronson, and Gropper (136), for 
hostile (vs. neutral) words by Smith 
(158), for taboo (vs. fruit, neutral) 
by Freeman (63, 64) who reported 
“no evidence for perceptual defense 
against taboo words when Ss have 
been set by instructions to look for 
and report such words” (63, p. 285). 
Mausner and Siegel (119) found no 
relation between monetary value as- 
signed to postage stamps and thresh- 
olds, but the session assigning arbi- 
trary values should be considered as 
constituting a foreknowledge session. 
Development of insight was the in- 
terpretation given by Bitterman and 
Knifiin (10) to the threshold drop in 
taboo (vs. neutral) words as the ex- 
periment progressed. Chodorkoff 
(28) asked why insight had not de- 
veloped toward neutral words. Beier 
and Cowen (7), however, reported 
defense effects to emotional (vs. 
neutral) words (whore, anger) even 
with forewarned Ss, as did Aronfreed, 
Messick, and Diggory (3), for un- 
pleasant words (vs. pleasant) with 
forewarning. Lawrence and 
(103) presented a list after stimu- 
lus presentation as well as before. 
Both groups had equally lowered 


Coles 
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thresholds (vs. control), leading the 
Es to conclude that response facilita- 
tion and memory were involved. 


Studies Using Forced-( ‘hoice Indica- 
tors 

A group of studies reports use of a 
“psychophysical method, 
trom Blackwell... (who 


adopted 
has) re- 


ported detection thresholds of a high- 
ly reliable character obtained with 
this method” (4, p. 39), that is, spa- 
tial forced-choice, with four positions. 


Complex stimulus configurations are 
generally presented, with at least one 
picture relevant or made relevant to 
S. Sensitization is inferred from in- 
creased call of a location containing 
this critical picture (usually vs. non- 
critical), and defense from such de- 
creased call. 

Blum (16) presented dogs from his 
Blacky test. The S reported which 
stood out best. A critical dog (mastur- 
bating, oral-aggressive) and neutral 
dog (carefree) were shown with S told 
to consider ‘“‘when you might.have 
felt” this way (p. 95). The perceptual 
task was then repeated at a duration 
representing the same “‘low level of 
awareness” (p. 95), giving a sensiti- 
zation effect for the critical dog (vs. 
first test) predicted from psychoana- 
lytic theory. Exposure time was then 
increased to one involving previous 
recognition of “one or more of the 
pictures” (p. 96). The S was asked to 
locate the critical dog and was scored 
for accuracy. Accuracy frequency was 
lower than preceding standing-out 
frequency in accord with a predicted 
repression (defense effect). Smock 
(162), arguing that in a complex 
stimulus the “properties which deter- 
mine whether the stimulus ‘stands 
out’ clearly at .03 seconds”’ are not 
identical to those ‘“‘which determine 
the ease of correct discrimination at 
.20 seconds” (p. 70), attempted to 
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provide such a control, and obtained 
sensitization effects, with analogous 
defense effects not as clear. 

In a second experiment, Blum (17) 
classified Ss according to conflict 
areas of the 11 Blacky pictures. The 
same four dogs were always shown. 
The S was told all 11 would appear 
equally and was to tdentify each dog 
presented ‘“‘subliminally’’ (p. 25). 
The score was not accuracy but num- 
ber of calls for each of the 11 dogs. 
Critical dogs not shown were called 
with frequencies similar to neutral 
dogs not shown, arguing against re- 
sponse suppression. There was, how- 
ever, a_ significant defense effect 
against critical dogs projected (vs. 
neutrals projected), hence Blum ar- 
gued ‘perceptual defense can _ be 
traced to the perceptual process it- 
self”’ (p. 27), which can be considered 
unconscious since “‘subjective reports 
show that the phenomenon was not a 
conscious one” (p. 27). 


An experiment by Nelson (127) 
combined procedures used in both 


Blum experiments. Defense effects 
were obtained with the call-frequency 
indicator, and sensitization effects 
with the stand-out indicator. Blum 
(18) called out a location, with S re- 
quired to say ““Blacky doing so-and- 
so”’ (p. 171). Number of calls and ac- 
curacy were compared. Where ac- 
curacy was low, there were less calls 
for critical dogs (vs. neutral). Where 
accuracy was high, there were more 
calls for critical dogs (vs. neutral), 
leading F to suggest a 
against a defense”’ (p. 173). 
Atkinson and Walker (4) presented 
a face (critical) and lamps, plates, 
and the like, using the stand-out indi- 
cator. High n Affiliation Ss displayed 
sensitization to faces (vs. low n Affil). 
Pustell (148) used a similar indicator 
with geometric figures, the critical 
having been associated with shock. 


“defense 


393 


Males displayed vigilance effects (vs. 
neutral figures) and females tended to 
defense: ‘“‘one way to reduce the drive 
was to avoid seeing its stimulus 
wherever possible’ (p. 434). 

Dulaney (45) attached differential 
consequences to responses involving 
which of four positions stood out 
best. In one case locating a critical 
figure led to shock; in the other not 
placing it did so. Geometric figures 
were presented at a “level of aware- 
ness too low to be named.” Critical 
places dropped in the first case and 
rose in the second, leading E to con- 
clude that “‘perceptual défense and 
vigilance are learned reactions to anx- 
iety arousing stimuli” (p. 337). 

Goldiamond (71) scored Ss on ac- 
curacy of forced-choice location of a 
triangle started at 0 intensity and 
gradually increased. The S received 
immediate information on accuracy, 
and had to correct himself. An ex- 
perimental group was told ESP was 
involved. As the stimulus increased 
in magnitude, E argued, these Ss 
would learn that its position was re- 
lated to reinforcement, and when this 
occurred, S would start responding to 
the stimulus. Since S” would then be 
at high magnitude, the curve would 
take a sudden spurt. This sharply in- 
flected insight-type curve was ob- 
tained; controls produced psycho- 
physical ogives. 


Studies Using Miscellaneous Indica- 
tors 

Hypothesizing that shock might 
produce a “startle response (which) 

. interferes with recognition and 
recall of briefly presented material” 
(p. 15), Hochberg, Haber and Ryan 
(79) sounded a buzzer simultaneously 
with, and also after, presentation of a 
nonsense syllable, the buzzer being 
followed by shock. Both conditions 
significantly lowered correct reports 
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of critical (vs. neutral) syllables. 

Lowenfeld, Rubenfeld and Guthrie 
(108) required. identification of non- 
sense words presented at levels of 40— 
60% accurate identification, the criti- 
cal words having been previously as- 
sociated with shock. They obtained 
higher GSR’s to critical (vs. noncriti- 
cal) words-presented. That the GSR 
followed the veridical critical word 
rather than critical word responses 
was considered supporting subcep- 
tion. Rubenfield, Lowenfeld and 
Guthrie (154) presented variations 
on a rectangle, and on a triangle, 
shock being applied to a rectangular 
figure. Veridical shock figures mis- 
reported were associated with greater 
GSR’s than misreported veridical tri- 
angles, leading the authors to con- 
clude that stimulus generalization 
had occurred in subception. 


Threatening and nonthreatening 


instructions were employed by Mof- 


fitt and Stagner (124) who pre- 
sented a completed geometric figure, 
and then tachistoscopically presented 
random modifications. The groups 
differed significantly, with Ss under 
stress ‘‘clinging to one interpretation 
of an ambiguous figure while it is 
changing to another” (p. 355). Zeit- 
lin (197) trained Ss on nonsense syl- 
lables, two of which were coupled 
with an annoying noise for a punish- 
ment group, with monetary gains for 
a reward group, and neither for a 
control. The punished and rewarded 
words received more correct calls 
than the controls when presented 
tachistoscopically at the same low in- 
tensity-durations; this was related by 
E to a response shift rather than per- 
ceptual change since these words 
were also called out more often when 
blanks were used. 

An experiment by Eriksen and 
Wechsler (57) related the amount of 
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information in S’s response to the 
number of responses used, and the 
consistency with which he used them. 
Absolute judgment was used for 11 
squares differing in size, but with a 
range of only 4 jnd’s. The Ss classi- 
fied as anxious and nonanxious were 
almost identical in discriminative in- 
formation (1.871 and 1.877 bits) and 
very close to the 2.000 bits possible 
on the basis of 4 ideal discrimination 
responses. Anxious Ss, however, were 
lower in response equivocation, that 
is, more fixed in the use of responses 
available to them; both groups, the 
Es stated, perceived similarly, but re- 
sponded differently. In another 
study, Eriksen (54) associated shock 
with the middle square. One group 
identified the square by 1-11, and the 
other group identified the middle 
square by 6, all others being No. 
GSR’s were taken for all Ss. E re- 
ported that the number of available 
responses was a significant determi- 
nant of the subception effect. In- 
creasing the number of responses af- 
fected ‘‘verbal-response generaliza- 
tion but has no effect upon the gen- 
eralization of the GSR”’ (p. 360). The 
GSR was considered nondiscrete, 
while verbal responses were discrete. 
A continuous lever movement was 
compared with verbal response in 
another experiment by Eriksen (55). 
Verbal responses were 1-11 and Yes- 
No. The lever was moved through a 
seen and unseen arc. The different 
responses correlated with each other 
and with stimulus changes, there be- 
ing significant differences among the 
latter correlations. / concluded re- 
sponse and perception were experi- 
mentally differentiatable; differences 
between responses did not necessarily 
reflect perceptual differences, but 
could reflect response errors. 

The assumption of all-or-nothing 
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identification upon which any total 
identification indicator was 
challenged in the subception area by 
Bricker and Chapanis (21) and Mur- 
dock (125) as well as by studies by 
Eriksen and others cited. Bricker 
and Chapanis asked Ss to make addi- 
tional guesses when wrong, until £ 
said “Right.’’ The guesses turned out 
to be nonrandom. Murdock argued 
that if ‘‘all-or-none identification did 
occur, the incorrect guesses should be 
randomly distributed among all pos- 
sible alternatives,’ (p. 566). Evidence 
was found in both experiments for re- 
jection of an all-or-nothing charac- 
terization of accuracy. Accordingly, 
regarding perceptual defense, sensi- 
tization, and subception, Murdock 
concluded that the Es were ‘‘unable 
to exclude the possibility that S ob- 
tained some information from stimuli 
which were wrongly identified” (p. 
571). Voor (187) combined analysis 
of information in null semantic 
with effects. 

associated crucial 


rests 


subception 
with 
syllables, which were then presented 
at levels yielding accuracy from 0- 


responses 


Shock was 


50°). Three different indicators were 
concurrently employed: the auto- 
nomic GSR indicator, a naming ac- 
curacy indicator, and a semantic indi- 
cator (seen, doubtful, blind guess). 
Voor reported that S tended to use 
noncrucial responses when in doubt, 
and from comparison of the au- 
tonomic-accuracy asynchrony with 
the autonomic-semantic asynchrony, 
concluded that the subception effect 
rested upon information in null se- 
mantic responses. 

The foregoing review of the litera- 
ture is not intended as an exhaustive 
account of experimentation in this 
area. The attempt, rather, has been 
to indicate the variety of methods em- 
ployed, the scope of variables investi- 
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gated, and major conclusions and in- 
ferences drawn. Since the validity of 
these conclusions and inferences must 
rest upon the validity of the indica- 
tors used, the discussion will turn to 
indicator methodology. 


INDICATOR ANALYSIS: THE ASCEND- 
ING METHOD OF LIMITS AND THE 
INITIAL TOTAL ACCURACY 
INDICATOR 


The psychophysical method which 
is almost generic for these studies is 
the ascending Method of Limits, and 
the indicator which occupies similar 
prominence is an accuracy indicator 
involving initial total identification. 
This combination has great face va- 
lidity. If & wants to find out the 
minimal energy level at which S will 
identify a stimulus, the most sensible 
approach would seem to be to start 
out with the stimulus unrecognized, 
and then raise the energy level until it 
is identified. Once identified, mem- 
ory may enter as a variable in the re- 
It will be a contention of this 
discussion that the face validity of 
this indicator is not convertible into 
other types of validity, and that one 
would have to go far in the experi- 
mental literature to find a more in- 
valid indicator of concept-perception 
than this indicator coupled with the 
method employed. It will be further 
irgued that many of the differing ex- 
traneous variables thus far discov- 
ered, for example, frequency and 
foreknowledge, can be subsumed un- 
der the more general variables elic- 
ited through an examination of indi- 
cator methodology. Such an analysis 
may also serve to uncover substan- 
tive material which has been over- 
looked in the controversy, and may 
also serve to suggest further research. 
Indicator analysis will be presented 
under appropriate headings. 


sponse. 
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Response Bias, Accuracy, and Con- 
gruence 

Accuracy has been defined as the 
congruence between S’s responses 
and £’s score sheet. Presumably, sig- 
nals follow the patterns on E's key, 
but this is not necessary. Bias has 
been defined in the sampling sense, 
namely, that not all responses will 
have an equal probability of appear- 
ing in the response sample. To indi- 
cate how these could interact to pro- 
duce defense and sensitization effects, 
an example will be presented: S an- 
swers a multiple choice examination 
of the objective type, with 5 alterna- 
tives; E’s key sheet is punched so 
that alternative B is correct, straight 
down the line. Given two groups of 
Ss, one with a slight tendency to re- 
spond with a B, Group BR, and one 
without such a tendency, Group OR, 
then on this basis alone, BR should 
produce B responses earlier than OR, 
and therefore be correct earlier, that is, 
at a lower question number. If, now, 
instead of numbering the questions in 
ascending integers, 1, 2, 3,..., m, 
the usual procedure, E numbers them 
in ascending durations, .01, .02, 
Is and calls the 
first congruence the recognition thresh- 
old, BR should also have a lower 
recognition threshold expressed in 
time, voltage, carbon number, or 
whatever procedure is used. Instead 
of varying groups, we can vary cor- 
rectness of choice, and will get a 
lower threshold for C in a 5-alterna- 
tive question than in a 10-alterna- 
tive question, since there are biases 
toward the middle response. 

Such response biases can in and of 
themselves account for the results 
cited. Lindner (107) reported that 
sexual offenders showed greater ‘‘per- 
ceptual sensitization” to sexual pic- 
tures than other prisoners. An 
ascending series of pictures, from a 


., m seconds, 
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line on up, was used. The sexual of- 
fenders were ‘‘more sexually respon- 
sive than were the controls to sex- 
ually stimulating, nonsexually stimu- 
lating, and to ambiguously con- 
structed test items’ (p. 373). Asa 
matter of fact, the mean number of 
sex responses for the sex offenders 
was 10.2; for the controls, 3.2. For 
the same reason, one would expect 
sensitization effects for words fre- 
quently used by Ss in therapy, as 
Daston (40) reports. In a like man- 
ner, Elliott and Wittenberg (49) sug- 
gest that the accuracy with which 
anti-Semites identify Jewish faces is 
a function of a response bias to use 
the label, Jew, for when there is a 
minority of Jews in the sample, the 
Ss become inaccurate. 

The bias may be idiosyncratic, as 
these examples suggest, or it may be 
a linguistic one. It is under this sim- 
ple bias heading, rather than in terms 
recency, association, generalization, 
competing responses, and the like 
(cf. 165), that the effect of word-fre- 
quency upon recognition thresholds 
(cf. 82, 83) can be subsumed. <Ac- 
cordingly, it would be expected that 
“common color words such as black, 
brown, and grey” will produce con- 
gruences earlier (rather than be 
“recognized more quickly’) than 
‘“‘some of the more esoteric words, e.g., 
indigo and azure” (139, p. 372). Data 
by Zipf (198) indicate a logarithmic 
relationship between the frequency of 
a word and its rank order of use. 
Recognition thresholds are rank-order 
data. 


Biases may also characterize cul- 
tural groups; psychological terms are 
far more frequently used by the au- 
thor than geologic terms, and he 
would expect congruence rank to fol- 


low suit. Needs would also bias fre- 
quency, for when thinking of a pay 
raise, and mustering arguments for 
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one, monetary terms would become 
frequent. Indeed, an increase in food 
responses as a function of hunger has 
been reported (110). Chodorkoff's 
(27) report of sensitization to threat- 
ening words by Ss progressing in 
therapy might mean that they could 
now verbalize problem areas. Re- 
sponse consistencies should also be 
expected. The discovery that Ss who 
forget words associated with failure 
will have heightened thresholds for 
them (53) translates simply to the 
statement that words whose frequen- 
cy is lowered by failure will display 
this negative response bias in a task 
where a stimulus is increased in mag- 
nitude as well as in a memory task. 
Failure can also, for certain Ss, in- 
crease response bias (167). 

Results on the effects of stress can 
be simply explained if it is assumed 
that the experimental stress utilized 
led to response stereotypy. If an S 
under stress starts off with the wrong 
response (as he should if there is a 
large number of alternatives), he will 
perseverate with this incongruence 
and have a “high threshold,” that is, 
be congruent later than an S not 
under stress, as the various studies 
cited would indicate (cf. 50, 137, 160, 
163). 

Neisser’s (126) conclusion § that 
“seeing rather than saying” is in- 
volved is based upon an assumption 
of response similarities in homonyms; 
the verbal motor mechanisms may be 
identical, but it is questionable if this 
is an exclusive definition of the re- 
sponse. Lay and lei have different 
frequencies of occurrence. 

Conditioning is, of course, a class- 
ical way of biasing responses, that is, 
altering frequency of occurrence so 
that it is not random. This would 
subsume the studies in which fore- 
knowledge was employed. Here, it 
will be recalled, results almost uni- 
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formly indicate sensitization, rather 
than defense. Instructions to human 
Ss serve the same function as training 
procedures for other animals. Thus, 
training S to certain responses can al- 
ter frequency ratios to eliminate 
previous bias (cf. 3, 10, 63, 64, 101, 
103, 136, 158), or can create such 
through training, as in the built-in 
frequency experiments (62, 95, 134, 
144, 165). 

Wiener (191) utilized the same 
words (fairy, balls, pussy, screw) 
in two different contexts assuming 
thereby that frequency had been con- 
trolled. This increased the bias toward 
the words in the sexual context. The 
words have little in common on a 
nonsexual basis; categorization into a 
discrete sexual category should facili- 
tate generalization as the data on 
semantic generalization suggest (cf. 
100). 

The organism enters the perception 
situation with built-in response bi- 
ases, that is, he has been shaped by 
preceding conditionings. Certain of 
these biases are so regular as to en- 
able us to recognize him by them; 
personality relates to 
such biases. This statement should 
not be taken to mean that personality 
variance in perception is excluded 
from affecting perception by the fiat 
of being defined as extraneous. It 
does mean that before personality 
variables can be accepted as affecting 
concept-perception, it must be dem- 
onstrated that their effect is not via 
a response bias extraneous to percep- 
tion. The same consideration would 
hold for other variables such as need, 
learning, hedonic value, and the like.’ 


presumably 


™For stimuli presented at the same in- 
tensity, Zeitlin (197) argues that response 
shifts may be sufficient to explain the re- 
ported perceptual shifts of autism, perceptual 
defense, and sensitization. His argument can 
be reformulated in the following manner. 
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The effect of the interaction of this 
factor with partial identification of a 
discriminated stimulus needs little 
elaboration. A couple of letters dis- 
criminated may provide the occasion 
for a response which has been previ- 
ously reinforced under similar condi- 
tions. If this response has a higher 
probability than others, and this bias 
will lead to quick congruence, S will 
display sensitization effects. If S’s 
bias does not agree with the score 
sheet, there may be defense effects. 
Similarly, partial recognition may 
lead to suppression of a response 
which in the past has been followed 
by aversive consequences. 


Total Identification 


This indicator-method involves to- 
tal identification of the stimulus for 
the congruence to be scored, that is 





a —_— ’ 
E’s presentations and S’s responses can be 
entered into a 2X2 table in terms of the in- 
teraction of presentations-responses which for 
each are (a) Negative, that is, irrelevant to a 
need under consideration (words such as 
school, church, in a food or taboo experiment), 
and (b) Positive, that is, relevant to the need, 
whether the need is “positive’’ (love, steak), 
or “‘negative”’ (hate, bitch). If E flashes both 
negatives and positives equally, and S's re- 
sponse biases are similarly equal, there will be 
a .25 entry in each cell. Now, assume that the 
need dimension is food words, S is food-de- 
prived, and his responses shift to .70 relevant, 
and .30 nonrelevant. The entry becomes: 


E presents 

(as before) 

Non- Rele- 

rel. vant 

-50 .50 
. Relevant .70 — 2 
S now responds sate 30 1S 15 


The changes in entries (from .25 each) can 
be interpreted in terms of response shift. The 
S is as accurate as he was before (.15 true- 
negative plus .35 true-positive equals previ- 
ous .25 true-negative plus .25 true-positive), 
and as inaccurate, but his responses have 
shifted. 

On the other hand, one can, as in the sub- 
liminal effect cited previously, ignore the whole 
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the complete word must be named. 
This can be considered an all-or- 
nothing notion of accuracy with a 
vengeance. Bricker and Chapanis 
(21), Murdock (125), and Eriksen (55 
have indicated that for the percep- 
tual studies under discussion there is 
information contained in erroneous 
responses, and Tanner (175) and 
Swets, Tanner, and Birdsall (174) in- 
dicate more generally for even sim- 
plersituationsof psychophysical meas- 
urement that there is information in 
errors, that the threshold is arbitrarily 
small, and that information is con- 
tinuous, that is, not discretely di- 
vided into information and no-infor- 
mation. 


Properties of the Stimulus 


Stimulus-response relationships 


can be classified into various cate- 


table. 
E can 
shifts, 
esses”: 

1. False-positives (Need when 
Nonneed presentation): autism. Autism has 
not been discussed (cf. McCelland and Atkin- 
son (110) food experiment cited) since such 
studies generally utilize measures other than 
the threshold data to which this discussion is 
restricted. 

2. True-positives (Need response when 
Need presentation): perceptual sensitization. 

3. False-negatives (Nonneed response when 
Need presentation): suppression, repression, 
perceptual defense. 

Since sensitization and defense are comple- 
ments, the increase in sensitization in the 
example given means a decrease in defense. 

Liberties taken with Zeitlin’s presentation 
include reversing rows and columns to make 
the presentation accord with mental test 
theory and presenting defense as a false-nega- 
tive and complimentary on the table to sensi- 
tization. 

This unpublished dissertation was brought 
to the attention of the author after he had 
sent a copy of this discussion for criticism to 
Donald T. Campbell, of Northwestern Uni- 
versity, to whom the author is indebted for 
his prompt loan of his personal copy of the 
dissertation. 


Concentrating only on certain entries, 
obtain the following “perceptual” 
explainable by ‘“‘unconscious proc- 


response 
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gories. For example, a_ stimulus 
which follows a response and _ in- 
creases the probability of that re- 
sponse is called a reinforcing stimu- 
lus. The stimulus preceding that 
response may be a discriminative 
These statements can be re- 
phrased to speak of the reinforcing 
property of the stimulus, and the dis- 
criminative property (93, 157). A 
further distinction can be made be- 
tween operant and respondent con- 
ditioning. These distinctions are not 
always clear in the literature cited. 
When a shock is presented ‘“‘simul- 
taneously”’ with a stimulus, subtle 
differences in timing can produce 
shock effects contingent upon re- 


one, 


sponse, rather than produce classical 
conditioning. 
markedly. 
xcept for the GSR’s of subcep- 
tion, the response in these experi- 
ments (and most psychophysical and 
other perceptual experiments as well) 


This will affect results 


is almost invariably an operant. This 
means that the response given to a 
discriminative stimulus is not evoked 
by it, but will occur in the presence 
of the discriminative stimulus if such 
response (in the presence of the 
stimulus) has related to it a history of 
reinforcement. A triangle presented 
ona screen may evoke a GSR if it has 
been respondently conditioned to 
shock, but there is nothing about it to 
evoke the verbal response Triangle, 
Yes, 7; these are operants. Response 
will depend upon training procedures 
and reinforcing stimuli, as well as 
discriminative stimuli, Cessation of 
the response can define anxiety {cf. 
58) as well as nondiscrimination. In 
short, the response will co-vary with 
all variables that affect operants. 

It follows thatif the stimulus mag- 
nitude of the discriminative stimulus 
is too low for discrimination, rein- 
forcement of an ensuing response 
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may lead to “superstitious behavior” 
(93, pp. 102 ff.), that is, there will be 
reinforcement of the response occur- 
ring in the presence of a stimulus 
other than the discriminative one 
put in by &. Response will be occa- 
sioned by an “irrelevant’’ stimulus. 
Accordingly, stimulus magnitude of 
S? not only affects the discrimina- 
bility of the stimulus, S”, the usual 
function attached to magnitude, but 
also the extent to which reinforce- 
ment can shape the discriminative re- 
sponse. Sorel 

These two properties of stimulus 
magnitude should be kept separate; 
one involves discrimination or per- 
ception, and the other the application 
of reinforcement, a learning variable. 
The statement, “If the correctness 
of the response depends largely upon 
the characteristics of the stimulus, 
... this might be called perceptual 
behavior” (106, pp. 316-317) would 
seem to blur this distinction. In the 
experiment by Goldiamond cited 
(71), these two functions were sepa- 
rated. In this experiment, increase of 
magnitude of S” increased the rela- 
tionship between response to S” and 
reinforcement, leading S to respond 
according to S” rather than accord- 
ing to ESP instructions he had been 
given. Stated otherwise, once S had 
learned that a stimulus was being 
presented, the discriminative proper- 
ty of that stimulus came into play, 
producing a switch from a zero con- 
gruence to one at high level. The 
“Ah-ah” phenomenon reported by 
Miller (121) under similarly decep- 
tive circumstances may relate to this 
inflection point. 

This suggests another invalidating 
variable which may be attached to 
the particular combination of indi- 
cator-method under discussion. In 
the usual discrimination experiment, 
where the functional relation between 
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response and stimulus magnitude is 
the subject of investigation, S is well- 
trained in what it is that he must re- 
spond to. Discrimination can then be 
related to other variables, such as 
ablation, stress, dark-adaptation, and 
so on. Using the indicator-method 


combination under discussion, SS must 
learn during the experiment what it is 
he is supposed to respond to. Con- 
ceivably, some of the differences ob- 
tained in the various thresholds could 
be related to differences in learning 
variables rather than perceptual ones. 


Extraneous 
Perception. 


Variance and Concept- 


It follows from this discussion that 
the validity of the accuracy indicator 
involved as an indicator of concept- 
perception is questionable. Since its 
apparent validity as an indicator of 
perception, with any surplus meaning 
attached to the term, rests upon its 
validity in the minimal sense, its 
status as a valid indicator of percep- 
tion with any other historical defini- 
tion may be challenged. 

Indeed, the review of the literature 
presented can be reread with some- 
what more clarity if one substitutes 
diminished response for defense effect 
and for raised thresholds, and in- 
creased response for sensitization ef- 
fect and for lowered threshold. 


INDICATOR ANALYSIS: FORCED- 
CHOICE AND MISCELLANEOUS 
STUDIES 

The “psychophysical method, 
adopted from Blackwell” for some of 
the forced-choice studies to be dis- 
cussed, because it yields “‘detection 
thresholds of a highly reliable char- 
acter” (4, p. 39) deviates in certain 
major aspects from Blackwell's pro- 
cedures (cf. 11, 12, 14). These devi- 
ations may admit extraneous vari- 
ance and thereby may affect the reli- 
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ability as well as the apparent valid- 
ity of the indicators used. 

Blackwell uses forced-choice as an 
accuracy indicator. The studies in 
this section obtaining defense effects 
(4, 16, 17, 45, 127, 148, 162) use 
forced-choice as a semantic indicator. 
The difference between these two in- 
dicators has been discussed in ,the 
section on subliminal perception, and 
these alternative uses of forced- 
choice may be clarified by using an 
example from multiple-choice pro- 
cedures, A question can read: 
‘1. Who discovered America? (a) Co- 
lumbus, (b) Balboa, (c) Drake, (d) 
Raleigh.”’ Or, it can read: ‘2. Who 
was the greatest explorer? (a) Colum- 
bus, (b) Balboa, (c) Drake, (d) Ra- 
leigh.’’ Question 1 utilizes forced- 
choice as an accuracy indicator, since 
it has a correct answer, and accuracy 
presumably indicates knowledge of 
history. Question 2 utilizes forced- 
choice as a semantic indicator; there 
is no correct answer. Choices made 
may relate to opinions, biases, and 
other variables relating to decisions, 
such as, possibly, the nationality of 
the teacher. This would also be the 
when £E asks which picture 
stands out best, or the related ques- 
tion of which is clearest? No picture 
is clearest, hence, no designation can 
be accurate. The Ss are asked to name 
pictures, but accuracy is not the 
prime concern of E£. This differenti- 
ation into accuracy and semantic in- 
dicators is important for considera- 
tion of indicator validity since Black- 
well (14) and other investigators (69, 
70) have reported that the semantic 
indicator is especially prone to co- 
variation with variables extraneous 
to perception, and Tanner (175) and 
other investigators in decision theory 
(9, 159, 176, 177) have reported that 
the semantic indicator automatically 
includes decision processes governed 


case 
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by maximization of response utility 
rather than of information.’ These 
studies tend further to question the 
meaning of the null response of S 
when he gives subjective reports that 
he does not see the stimulus, leading 
to the inference by some of the Es 
cited that the phenomena involved 
were not conscious. 

These considerations, and the lack 
of control for information in inaccu- 
rate responses as well as in null se- 
mantic would seem to 
make it unnecessary to go into detail 
into the specific experiments, some of 
which are highly ingenious. This dis- 
cussion is not concerned with issues 
raised by the apparent 


responses, 


predictive 


power of theoretical positions han- 
died adroitly, but does question the 
relationship of the data to perception. 


® This discussion concerns classification of 
used in these 
studies as a semantic indicator rather than as 
an accuracy indicator. The apparent validity 
of Blackwell's forced-choice indicator derives 
from its methodology, rather than from its 
use of forced-choice, or its classification as an 
accuracy indicator. As the ascending Method 
of Limits studies indicate, use of an accuracy 
indicator is no per se guarantee of validity. 
Some of the criteria involved are discussed 
by Birdsall (8) for his methods; “signal known 
exactly” is one criterion. 

The multiple-choice examination question 
given should be considered only an example, 
since it would probably be an accuracy indi- 
cator readily admitting extraneous variance. 
The more conventional forced-choice indicator 
would involve structuring the wrong answers 
as identical to each other, with the correct 
answer differing along the dimension of dis- 
crimination. A four choice situation might 
signal plus noise 


the forced-choice indicator 


‘have (a) (b) same noise, 
c) same noise, (d) same noise. The example 
given has a correct answer and three different 
wrong answers. 

Difficulties of this kind stand in the way of 
ready application of these procedures to men- 
tal test theory. There seems little reason to 
doubt that application can be made of the 
notion of information in incorrect answers in 
multiple choice tests, currently scored entirely 
on an all-or-nothing basis. 
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PERCEPTUAL DEFENSE REVIEWED 

An enthusiastic supporter of early 
subliminal perception experiments 
and their relation to subconscious 
phenomena was William James, who 
regarded exploration in the latter, 
that is, the ‘“‘extra-marginal and out- 
side the primary consciousness” as 
“the most important step forward 
that has occurred in psychology since 
I have been a student of that sci- 
ence’’ (90, p. 228). Binet’s work with 
hysterics was cited approvingly, and 
subliminal perception was one type 
of incursion into consciousness ‘‘of 
which the subject does not guess the 
source, and which, therefore, take for 
him the form of unaccountable im- 
pulses to act”’ ¢p. 229), a theme re- 
peated in the present flurry over sub- 
liminal advertising, and the reported 
motivation of many of the studies re- 
ported in this discussion; it was to 
this theme that Titchener and Pyle 
(182) addressed themselves when 
they concluded apropos Dunlap’s 
(46) subliminal Miiller-Lyer effect, 
“that if the subconscious is to be re- 
ceived into experimental psychology 
at all, it must find some other means 
of access than these imperceptible 
shadows” (p. 109). The enthusiasm 
of James for these subliminal phe- 
nomena was matched by his con- 
tempt for the psychophysics of his 
day; the methods were “laborious”; 
Fechner was considered as having 
pedantically “tabulated no less than 
24,574 separate judgments” (89, p. 
23). 

James is mentioned because the 
correlation between his attitudes 
toward unconscious perception and 
psychophysical methodology is a 
highly negative one; this correlation 
would seem at best to have become a 
zero one in recent times. The recent 
reviews which surveyed, in part, the 
areas discussed, did not refer to cur- 
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rent advances in psychophysics, nor 
is this atypical. Psychophysical 
methodology may seem on the sur- 
face to have very little relevance to 
human behavior when compared to 
the substantive personality discover- 
ies of the unconscious perception 
studies discussed, but it would appear 
from this discussion that an under- 
standing of psychophysical method- 
ology is highly relevant for under- 
standing the personality studies and 
for evaluating and discovering what 
their substantive contributions are. 

It is one of the aims of this paper, 
and others to follow, to draw atten- 
tion to the significance of current 
psychophysical research as a method- 
ological tool of great power for the in- 
vestigation of those variables that 
students of habitual behaviors and 
those considered abnormal have long 
been concerned with. The research in 
indicator methodology by Blackwell 


may provide techniques to isolate not 
enly the extraneous variance in the 
perceptual response related to the 
historical variables discussed, but al- 
so the perceptual variance itself. If 
needs (or drugs) do affect percep- 


tion, it should show up here. Some 
of the implications of the work in de- 
cision theory appear to be evident. 
For example, the willingness to take 
risks as indicated in placement on 
R.O.C. curves conceivably relates to 
a past history of reinforcements and 
aversive consequences attached to 
venturesome behaviors. Another 
commonality that this research has 
with operant conditioning is the 
emphasis it places on the consequeces 
of the response. Comment could also 
be amplified on the relatedness of 
these areas to an economics which 
constructs utility tables from the two 
variables of the organism's behavior 
and the environmental consequences. 

It is no accident that perception 
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has attracted the interest of investi- 
gators of personality. The experi- 
ments reviewed, and clinical practice 
itself (as judged by tests currently in 
use) would seem to support the view 
that S will reveal a great deal about 
himself which he would otherwise not 
do, by the way in which he ap- 
proaches a perceptual task. The sug- 
gestion made here is that how he re- 
sponds to this situation provides a 
key to understanding a past history 
of response biases built in, effective 
reinforcements, aversive situations, 
and the like. These are classically 
not perception variables. This would 
seem to be an interpretation more in 
accord with the evidence presented 
than one which implicates perception 
in these differences. Garner, Hake, 
and Eriksen (67) have recently re- 
asserted a necessary distinction be- 
tween perception and its response in- 
dicator (or the response from which 
it is inferred), and have pointed out 
that more than one measure may be 
necessary for an operational defini- 
tion. 

In view of the difficulty of conclud- 
ing that perception in its narrowest 
sense is involved in these experi- 
ments, it becomes difficult to see how 
interpretations attaching even more 
meaning can have support. Such dit- 
ferences in interpretation of the re- 
sponse have not been debated in this 
discussion, but a note of caution 
might be introduced to the effect that 
the addition of surplus experiential 
meaning to the response may become 
experimentally hazardous, though 
this does not follow from logical ne- 
Accustomed to attaching 
common-sense Meanings to responses 
such as “‘I didn’t see,"’ E may assume 
that it is the semantic referent of the 
response (the visual experience) that 
entitles him to attach indicator prop- 
erties to it. Jn actuality, it is not the 


cessity. 
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common sense referent of the response 
that determines the extent to which ind1- 
cator properties can be attached to it, 
but its methodological adequacy and 
usage in relation to concept-perception. 
The use of phenomenal report indi- 
cators in psychophysics is predicated 
on a long history of successful pro- 
cedures and data related to their use, 
ind not upon their having a semantic 
referent. The response, ‘‘I see’ is not 
accepted as an indicator of vision 
when made by a blind man, nor for 
that matter, by many Es when S has 
vision worse than 20/20. Thus, the 
danger of using the indicator for sub- 
jective inferences is that such use 
tends to lead E to classify resporises 
on a semantic basis, rather than in 
methodological terms. For example, 
E may state: “These responses relate 
to vision because S says he sees the 
stimulus when I show it.”” The meth- 
odological classification is then not 
made because of the ease and face 
validity of the experiential one. This 
might explain, in part, why person- 
ality and social psychologists who 
have been taught since Freud’s day, 
at least, to regard S's explanation of 
his experiences and behavior with 
suspicion, are found to be vigorously 
defending a position which works out 
to state that if S said he didn’t per- 
ceive it, then he didn’t (why should 
he lie?). In effect, they wind up de- 
fending a spurious operational posi- 
tion which they have detested and 
rightly battled in their own area, 
namely, that perception is what the 
perceptual response measures. 

It is questionable whether the in- 
vestigations discussed bear on per- 
ception, and it is precisely in this lack 
of bearing that the importance of the 
experiments may lie. Their extra- 
neous variance is relatable to lan- 
guage, learning, personality; the per- 
ceptual response seems to be a rich 
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source of such information. Before 
the extraneous variance can, how- 
ever, be utilized, it must be differen- 
tiated from the perceptual variance. 
Both constitute response variance. 
The extraction process would seem to 
require greater attention to psycho- 
physical methodology than that indi- 
cated by the William James correla- 
tion. 


SUMMARY AND CONCLUSIONS 
A review of the literature pertain- 
ing to unconscious processes in per- 
ception was undertaken since pre- 
ceding reviews had not taken into ac- 
count current developments in psy- 


. chophysical indicator methodology, 


which, it was felt, might clarify issues 
in this area, especially since they con- 
cern the two types of indicators most 
widely used. 

The term indicator was defined as 
the response element, the dependent 
variable, in an experiment conducted 
according to perception method- 
ology. Validity of this response as an 
indicator of perception was discussed, 
with variance demonstrated histor- 
ically as extraneous to perception be- 
ing considered as tending to invali- 
date it. No stand was taken as to 
whether perception would be consid- 
ered a concept defined by its method- 
ology, a subjective experience, or a 
sensation implying such an experi- 
ence, this being considered a philo- 
sophic issue irrelevant to the con- 
duct of research which is methodo- 
logically adequate. Validity as an in- 
dicator of concept-perception was 
considered crucial to the discussion, 
since if the indicator is invalid in this 
minimal sense, it must also be invali- 
dated as an indicator of perception in 
any broader sense. 

The experimental literature in sub- 
liminal perception, subception, per- 
ceptual defense, and perceptual sensi- 
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tization was reviewed in terms of in- 
dicator methodology. The greatest 
support for subliminal perception 
comes from experiments in which 
there is a discrepancy or asynchrony 
between an accuracy indicator and a 
semantic indicator, such that S is ac- 
curate at times when he reports no 
awareness of the stimulus. This has 
been called “discrimination without 
awareness.’ The indicator referents 
of subception are an autonomic indi- 
cator and an accuracy indicator, such 
that autonomic responses occur when 
S has not identified the stimulus cor- 
rectly. The indicator referent for 
most perceptual defense and vigi- 
lance studies is the accuracy indica- 
tor used for subception. Defense re- 
fers to systematically raised thresh- 
olds obtained with this indicator, and 
sensitization to systematically low- 
ered thresholds. The threshold com- 


parison is in terms of classes of stim- 
uli, Ss, experimental conditions, or 


combinations of these. 

The accuracy and semantic indi- 
cators of subliminal perception were 
described and systematically com- 
pared. Differences between the two 
relate essentially to the fact that in 
the accuracy indicator, the response 
is not scored, but rather the congru- 
ence of that response with E’s score 
sheet. The semantic indicator uti- 
lizes response scores. These differ- 
ences necessitate different controls 
for bias. Among these controls is the 
standard correction for false posi- 
tives, that is, reporting Yes in the ab- 
sence of a stimulus. Corresponding 
control for false negatives is not 
made, and this logical inconsistency 
defines the subliminal effect. 

Recent research in indicator meth- 
odology was considered; this bears 
directly upon these two indicators. 
This research tends to indicate that 
the semantic indicator readily ad- 
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mits variance extraneous to percep- 
tion and, accordingly, vields higher 
thresholds and lower reliability than 
accuracy indicators. Research in de- 
cision theory was also examined for 
implications for these two indicators. 
This research challenges certain of 
the assumptions upon which much 
standard psychophysical research is 
based. Among considerations of de- 
cision theory is that S can increase 
true positives, that is, detection, only 
at the price of increasing false posi- 
tives. Accordingly, the semantic in- 
dicator automatically involves a de- 
cision pracess as to which level of risk 
will be taken. Experimentation tends 
to support the conclusion that S tends 
to maximize utility of his response, 
rather than information. The re- 
sponse can be considered an operant 
governed by its consequences. Re- 
garding accuracy indicators, research 
here has supported the contention 
that congruence is not all-or-nothing, 
and that there is information present 
in inaccurate responses. 

The accuracy indicator used in 
subception, perceptual defense and 
sensitization was investigated. This 
is coupled with the ascending Method 
of Limits. It was argued that the re- 
sults obtained using this indicator- 
method combination could be 
counted for on the basis of response 
bias. The Ss whose response biases 
were in the direction of the entries on 
E’s score sheet would have a re- 
sponse-entry congruence earlier than 
Ss without such biases, and since an 
ascending stimulus magnitude is cou- 
pled to temporal sequence, would ap- 
pear to have a lower recognition 
threshold. Another invalidating ex- 
traneous variable considered involves 
S’s being required by this method not 
only to discriminate the stimulus, 


ac- 


but also to learn which stimulus it is 


prior to discriminating it. Since this 
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indicator as currently used also re- 
quires total recognition for a congru- 
ence score, this was discussed in light 
of research demonstrating informa- 
tion in errors. 

Studies forced-choice 
cedures were also examined. 


using pro- 
These 
procedures were considered as in- 
volving semantic indicators rather 
than accuracy indicators and, 
cordingly, open to the admission of 
extraneous variables which tend to 
invalidate them as indicators of per- 
ception. 

\ccordingly, it is 


ac- 


questionable 


whether the studies cited indicate dis- 
crimination without awareness, un- 


conscious perception, 
and the like, or demonstrate discrep- 
in and between indicators. 
These discrepancies can be functions 
of pairing an apparently valid indi- 
cator with one made less sensitive by 


processes In 


ancies in 
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admitting invalidating variance, or 
by using procedures which artifi- 
cially inflate thresholds and thereby 
make it appear that processes related 
to receipt of information are going on 
at below-threshold levels. 

It is concluded that most of the 
substantive contributions of the ex- 
periments reviewed cannot be dem- 
onstrated to be related to perceptual 
variables, and this is probably where 
their importance lies. The S, in re- 
sponding to a perceptual situation, 
tends to respond in terms of the con- 
sequences of his response and in rela- 
tion to other nonperceptual variables 
which probably characterize his re- 
sponses in other areas as well. This 
information he would probably not 
give under other circumstances, mak- 
ing the perceptual response a fer- 
tile one for investigating behaviors 
needed for assessment. 
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SPONTANEOUS ALTERNATION BEHAVIOR 
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Yale University 


Beginning with the observations of 
Tolman (37) and Dennis (6), there 
has developed over the past three 
decades a fairly extensive literature 
on spontaneous alternation. Attention 
to this phenomenon has increased re- 
cently with the renewal of interest in 
exploratory and curiosity behavior. 
Concern with alternation has had 
varied bases. To some it represents 
in simple form a special case of ex- 
ploratory behavior (e.g., 3, 23). For 
others it provides a convenient ap- 
proach to the determination of the 
course of performance decrement, 
and in this connection a means of in- 
vestigating the action of a reinforcer 
(e.g., 38, 47). A third interest lies in 
the potential use of alternation as an 


indicator response for the study of 
general psychological processes such 
as memory and perception (e.g., 4, 7, 


26). Finally, alternation has been 
considered a problem area in its own 
right, worthy of its own theoretical 
structure (e.g., 14, 38). 

Our purpose in this paper is to re- 
view the alternation literature in an 
attempt to codify empirical findings 
and relate them to theoretical issues. 
To accomplish this aim, the material 
has been organized in ways which cut 
across the areas of interest mentioned 
above. The contents of the paper are 
divided into five main sections. Dis- 
cussion is directed first at some gen- 
eral problems in the definition and 
measurement of alternation behavior. 
Then, the major theoretical issues 
concerning the general explanation 
of alternation are examined. In the 
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third section the more specific prob- 
lem of the sources of alternation, i.e., 
what is alternated, is considered. The 
fourth section deals with the data 
available on variables which affect 
alternation, and the final section 
treats of the relation between alterna- 
tion behavior and learning theory. 

It should be noted that this review 
is confined to the literature on al- 
ternation behavior in rats. There are 
several studies which reveal analo- 
gous behavior in human Ss (e.g., 18, 
33, 36, 44). But so far these experi- 
ments have added little to the infor- 
mation obtained from rat investiga- 
tions. Alternation in infrahuman spe- 
cies other than rats has been investi- 
gated (e.g., 22, 42), but not suffi- 
ciently so to warrant review. 


DEFINITION AND MEASUREMENT 


The typical paper on alternation 
provides a denotative definition of 
the sort: ‘‘. . . if an animal turns left 
in a T maze on its first trial, and if it 
is immediately returned to the start- 
ing point, the probability is quite 
high that it will turn right on the sec- 
ond trial”’ (39, p. 19). 

More abstractly, alternation im- 
plies the occurrence of at least two 
mutually exclusive categories of be- 
havior over at least two successive 
time periods. Typically, a “‘time pe- 
riod” is a trial, and a “‘category of 
behavior” a choice-point response in 
gross, qualitative form. Now, if there 
are two such categories of behavior, 
A and B, and two trials are consid- 
ered, four behavior patterns are pos- 
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sible: A followed by A, or B by B 
(repetitions), and A followed by B, or 
B by A (alternations). If there are 
more than two possible categories of 
behavior in a situation, as for ex- 
ample in a maze with three arms, 
alternation then becomes any non- 
repetition on successive trials. What- 
ever the number of trials given, 
alternation or repetition usually re- 
fers to the behavior pattern over any 
two successive trials. 

It alternation is considered purely 
a response process, the problem of 
definition is reduced merely to the 
specification of mutually exclusive re- 
sponses, e.g., turning right or turning 
left. But, alternation may involve 
a reaction to intra- and/or extramaze 
stimuli instead of, or in addition to, 
responses. If so, the specification: of 


mutually exclusive categories of be- 
havior becomes somewhat complex. 
In some situations, alternation of re- 
sponses may require repetition of 


stimuli. Conversely, alternation of 
stimuli may require repetition of re- 
Thus, it is often necessary 
to specify alternation with respect to 
particular stimulus or response di- 
mensions. Just what these dimen- 
sions, or sources of alternation may 
be, of course, is a problem for experi- 
mental analysis. 

A further problem arises in the sta- 
tistical treatment of alternation data. 
Where there are two categories of be- 
havior and two trials, two repetition 
patterns and two alternation patterns 
are possible. Under the null hypothe- 
sis that the behavior on Trial 2 is in- 
dependent of that on Trial 1, the a 
priori probability of alternation, po, is 
50. The significance of an obtained 
percentage of alternation would or- 
dinarily be evaluated against this ex- 
pected value, p.=.50. 


sponses. 
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Such a procedure, however, in- 
volves an asi''mption which is not al- 
ways met em pirically——-namely, that 
the two categories of behavior are 
equally likely. The alternation tend- 
ency may not be the only factor lead- 
ing to the nonindependence of Trial 1 
and Trial 2 behaviors. There may 
also exist a repetition tendency. 
Each S behaves in a particular way 
on Trial 1, and if nothing is changed, 
should behave in the same way on 
Trial 2. When, as often is the case, 
there is initially a strong preference 
for A or B, p, =.50 overestimates the 
a priori expectation of alternation. A 
more reasonable estimate of chance 
alternation would take into account 
the distribution of Tria! 1 behaviors. 
Thus, if 80° of the behaviors on 
Trial 1 were of Type A and 20% of 
type B, the a priori probability of 
alternation, ~., would be 1—(.80° 
+ .207), or .32. 

While rarely found in the experi- 
mental literature, this latter statisti- 
cal procedure is adopted in a recent 
article by Sutherland (35). An al- 
ternative to these statistical manipu- 
lations would involve careful control 
of the experimental conditions so as 
to minimize unequal distributions o! 
Trial 1 behaviors. Where strong pref- 
erences do occur, however, it would 
facilitate evaluation and comparison 
of data from different sources if these 
preferences were used in the estimate 


of po. 


GENERAL EXPLANATIONS OF 
ALTERNATION 


In this section our concern is with 
the most general theoretical problem, 
i.e., why does S alternate? Examina- 
tion and comparison of the main 
points of view on this issue will en- 
able a more meaningful evaluation 
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of the experimental data. 

Hull's reactive inhibition. Alterna- 
tion has been conceived of as purely a 
response process, explainable via 
Hull’s concept of reactive inhibition, 
or Ip (17). Thus, if a left-turning re- 
sponse is made.in a two-choice situa- 
tion, a certain amount of left-turning 
inhibition is generated which renders 
the right-turning response temporar- 
ily predominant (47). Integration of 
the theoretical properties ascribed to 
Ir leads to deductions of (a) the sim- 
ple occurrence of alternation behav- 
ior, (6) the spontaneous dissipation of 
alternation over time, (c) a direct re- 
lation between alternation and the 
number of forced trials to one alterna- 
tive, (d) an inverse relation between 
alternation and the number of suc- 
sively massed trials, (e) the response 
generalization of alternation, and (/) 
a direct relation between alternation 
and effortfulness of the response. 

Glanzer's stimulus satiation. As was 
noted previously, conceptions of al- 
ternation behavior need not be lim- 
ited to response-dependent processes. 
The equally plausible conception of 
alternation as a stimulus process is 
exemplified in Glanzer’s (14) explana- 
tion of the phenomenon. Basic to his 
system is the concept of stimulus 
satiation, which in formal properties 
is much like Zz, except that its source 
is in stimuli, not responses. To quote 
Glanzer: ‘Each moment an organism 
perceives a stimulus-object or stim- 
ulus-objects, A, there develops a 
quantity of stimulus satiation to 
A” (14, p. 259). After postulating 
several quantitative relationships, 
analogous to those ascribed to Ip, 
Glanzer then states that stimulus 
satiation reduces the organism's tend- 
ency to make any response to A. In 
this sense, a convenient and repre- 
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sentative label for the concept is s/. 

The similarity in formal properties 
ascribed to both the Jp and sI con- 
cepts is evident in the fact that de- 
ductions (a) through (d) of the former 
system apply also to the latter. The 
two systems are differentiated, how- 
ever, by Jz deductions (e) and (f) and 
by the additional sJ deductions 
which follow: (g) the stimulus gen- 
eralization of alternation, (A) re- 
sponse repetition, rather than re- 
sponse alternation, with reversal of 
the stimuli between trials, and (1) 
a direct relation between amount of 
alternation and intratrial interval, or 
length of exposure to an alternative. 

Alternation as a manifestation of ex- 
ploratory behavior. Not very different 
from Glanzer’s view is Montgom- 
ery’s (23, 24) explanation which 
makes of alternation a special case 
of exploratory behavior. Explora- 
tion, in turn, is explained as emerging 
from a curiosity drive aroused by 
novel stimuli. The S alternates be- 
cause the alternative last entered is 
the less novel. 

It is difficult to state precisely how 
the sJ and curiosity drive explana- 
tions differ from each other. As a 
consequence of this ambiguity, or 
perhaps as another way of emphasiz- 
ing the ambiguity, there is little re- 
search which is directed at narrowing 
down these two “alternative” ex- 
planations. Possibly, the lack of de- 
ductive differentiation between the 
systems is due to an underlying 
identity which is beclouded by se- 
mantic differences. Or perhaps it is 
due to the failure to consider predic- 
tions from the two systems which per-- 
tain to aspects of choice-point behav- 

! The similarity of their concepts to Pavlov's 
(28) investigatory reflex seems not have been 
explicitly noted by these authors. 
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ior other than the choice itself. 

For example, when an animal is 
placed in a simple two-choice, non- 
reinforcement situation, why does it 
leave the start-box? In accordance 
with the exploratory drive hypothe- 
sis, the novel stimuli beyond the 
start-box area elicit a curiosity drive 
which is expressed in the animal’s 
approaching and investigating these 
stimuli. But accumulated experience, 
e.g., through massed trials, with these 
stimuli should render their novelty 
negligible. As a consequence, the ani- 
mal's latency of selecting an alterna- 
tive would be expected to increase 
over trials. 

This prediction seems opposed to 
one deduced trom the sJ theory. Pre- 
sumably, in accordance with this 
view, the animal satiates to the stim- 
uli of the start-box area. Since satia- 
tion for certain stimuli reduces the 
organism's tendency to make any re- 
sponse to those stimuli, the animal 


avoids the start-box area by travers- 
ing the stem and selecting one of the 


alternatives. Over successive trials, 
the start-box area is experienced 
every time, but each alternative is 
experienced only about half the time. 
As a consequence, satiation for the 
start-box stimuli exceeds satiation 
for the alternatives at an increasing 
rate over successively massed trials. 
Hence, as trials progress, the increas- 
ing tendency for the animal to avoid 
the start-box area should be reflected 
in a decrease in the animal’s latency 
of selecting an alternative. 

Walker's action decrement. Thus 
far, our concern with the general ex- 
planation of alternation has centered 
in the distinction between stimulus- 
and response-oriented theories. How- 
ever, psychological actions or events 
need not bear an exclusive relation- 
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ship to either peripheral response or 
stimulus events. With this view, 
Walker (38) proposes, as an alterna- 
tive to sf and Ip, the concept of an 
action decrement, which is a central 
event. 

Deductive differentiation between 
the Glanzer and Walker systems ap- 
pears to be absent except in the exten- 
sion of the latter model to include 
roles for levels of motivation and re- 
ward. It is hypothesized by Walker 
that these two factors render greater 
both the immediate action decrement 
and a later action increment. Em- 
pirically this would mean a greater 
alternation tendency initially under 
reward and high levels of motivation, 
but just the reverse after an extended 
time lapse. 

Other explanations of alternation. 
The foregoing constitute the major 
explanations of alternation behavior. 
However, scattered throughout the 
literature are others. Alternation has 
been considered, for example, as an 
instrumental mode of activity (1, 12), 
and as a reaction to frustration or 
punishment (43). These explanations 
shall not be discussed further since 
they are unconvincing in the light of 
the wide variety of conditions vield- 
ing alternation behavior. Moreover, 
so far they have generated little, if 
any, research. 


SOURCES OF ALTERNATION 


In the preceding section, we dis- 
cussed the general explanation of 
alternation—why the animal alter- 
nates—and only indirectly the possi- 
ble sources of alternation, or what is 
alternated. Source and motivation 
are obviously closely related, but 
nevertheless are distinct issues. Our 
concern now is with the former. 

While the Je concept provided an 
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adequate explanation of the typically 
observed variety of alternation, it 
could not account for some of the 
earliest data. In an experiment by 
Dennis and Sollenberger (9) in which 
rats were allowed to explore a Y 
maze, the Ss’ behavior revealed a 
strong tendency to alternate maze 
alleys rather than turning responses. 
When allowed to explore a multiple 
Y maze, in a second experiment, the 
rats also showed a marked tendency 
to avoid a pathway occupied shortly 
before (9). Results similar to these 
have since been reported by Mont- 
gomery (24). 

That it was alleys, or more gen- 
erally stimuli, and not responses that 
were being alternated in the two-trial 
situation was an hypothesis tested by 
Montgomery (25) and by Glanzer 
(13) in essentially the same way. In 
both experiments a 4 maze was em- 
ployed with one leg blocked off on 
each trial so as to yield a conven- 
tional T maze. With one leg serving 
as the starting stem on the first trial, 
the animal was run from the opposite 
leg on Trial 2. In this manner stim- 
ulus and response components, or- 
dinarily covariate, were separated 
and pitted against each other. If the 
S alternated responses, it would nec- 
essarily repéat stimuli; to alternate 
stimuli, the S would have to repeat its 
previous turning response. In both 
experiments the major source of al- 
ternation was found to be in the ex- 
ternal stimuli rather than in the rat’s 
turning response, in confirmation of 
the earlier observations of Dennis 
and Sollenberger (9) and Montgom- 
ery (24). 

In the Montgomery + maze study, 
the S’s first trial response was re- 
warded. It is therefore conceivable 
that the second trial behavior ac- 
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tually manifested response repetition 
rather than stimulus alternation. 
Glanzer employed no reinforcement 
in his experiment, however, thus 
making the above interpretation un- 
convincing. 

In a subsequent study, which also 
lacked the use of a reinforcer, Walker, 
et al. (39) replicated the above results 
and in addition showed that extra- 
maze stimuli were less important as 
sources of alternation than were 
intramaze stimuli. They proposed 
that the relative importance of any 
of the potential sources of alterna- 
tion, including response, could be 
varied by manipulation of the dis- 
criminability of the alternatives with 
respect to these sources. Thus, even 
the response dimension might be in- 
creased in importance if the two re- 
were made more distinct 
from each other than they are in the 
typical T maze. This hypotliesis was 
tested and confirmed (41). It was 
further suggested that ‘ 
thought of as * 


sponses 


‘response”’ be 
response-feedback,”’ 
making it, in this way, another stim- 


ulus dimension, rather than some- 
thing unique. From this point of 
view, Jp deduction (e) becomes a spe- 
cial case of sf deduction (g). 

Several other experiments have 
been concerned with determining the 
sources of alternation. Rothkopf and 
Zeaman (31) conducted a series of 
studies which yielded the general con- 
clusion that both responses and stim- 
uli may serve as sources of dlterna- 
tion. They offered the parallel con- 
cepts of “tired stimuli’ and “tired 
responses”’ to account for alternation. 
In another set of experiments Estes 
and Schoeffler (12) found little evi- 
dence of response alternation and 
strong evidence of stimulus alterna- 
tion. 
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Also pertinent here is the recent 
finding by Sutherland (35) that the 
alternation tendency is less when the 
two goal arms lead to a common goal 
box than when they lead to separate 
and distinct goal boxes.* Sutherland 
argues that alternation occurs not 
just with respect to choice-point stim- 
uli, but also, and perhaps even more 
so, it occurs with respect to “the rat’s 
expectancy of the stimuli it will re- 
ceive beyond the choice point “ 
(35, p. 361). 

Overall, the empirical picture shows 
stimuli as the general source of al- 
ternation. The more proximal the 
stimulus, e.g., intra- as opposed to 
extramaze stimuli, the greater is its 
potential as a source. The response 
may serve as a source of alternation 
only if its “afferent feedback,” a 
stimulus component, is made salient. 
In relation to theoretical issues, then, 
the data support those views which 
consider a stimulus process as under- 
lying alternation behavior. 


VARIABLES AFFECTING ALTERNA- 
TION BEHAVIOR 

In. this section we are concerned 
withthe effect on alternation behav- 
ior of a rather heterogeneous set of 
variables, some of which relate to 
characteristics of the alternatives and 
others of which pertain to the tem- 
poral duration of alternation. 

Amount of work. A crucial deduc- 
tion, (f), of the Zr system led to ex- 
periments which investigated the ef- 
fect on alternation of variations in the 
amount of work associated with the 
responses. 


In all of these experi- 
ments, amount of work was not var- 


? Dennis (7), however, using a procedure 
similar to Sutherland's, obtained 80% alterna- 
tion, which is about the level found in the 
typical disctinct goal-box experiments. 
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ied between alternatives, but rather 
over groups of Ss, or over different 
stages of testing for the same Ss. 

In general, the results of these in- 
vestigations do not support the Jz ex- 
planation. 

Mowrer and Jones (27) gave rats 
access to two equivalent bars in a 
Skinner box. They found no effect on 
the rats’ alternation between bars of 
variations in the amount of work re- 
quired to depress the bars. 

Solomon (34) strapped weights on 
his rats, but found no effect of this 
manipulation on amount of alterna- 
tion ina T maze. In a second experi- 
ment, however, Solomon (34) varied 
work by inclining the goal arms of a 
T maze 16° from the horizontal, and 
obtained a slight increase in the 
amount of alternation. In an attempt 
to replicate Solomon’s positive re- 
sults, Walker, et al. (41) also used 
inclined alleys; the rats had to climb 
a 45° slope in making either response. 
When compared with rats running in 
the conventional flat T maze, these 
animals did not show more alterna- 
tion. 

In another T maze study, Mont- 
gomery (23) failed to find a work ef- 
fect, but, as he himself points out, the 
work manipulation came at the end 
of each alley rather than at the choice 
point. That is, the amount of work in 
making the turning response was not 
varied, and it is not clear why an ef- 
fect on alternation should be expected 
under these conditions. 

Riley and Shapiro (29) also failed 
to find any but a slight influence of 
work on alternation. In their experi- 
ment, the work manipulation was et- 
fected by varying the weight of doors 
through which the rats had to push in 
order to enter a reward chamber. 

From a point of view which places 
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no special importance in the response 
per se there is no reason to expect 
work manipulations of these sorts to 
affect alternation, as long as these 
manipulations do not mask other dif- 
between the alternatives. 
On the other hand, if S is so burdened 
down with work that other sources of 
differential stimulus input are trivial 
by comparison, then the alternation 
tendency might be expected to de- 
crease. There is a suggestion of such 
an effect in the Walker, et al. (41) ex- 
periment. 

It is possible to apply this inter- 
pretation to the results of an interest- 
ing experiment by Jackson (19). Rats 
were run under two conditions: in the 
first, a conventional open Y maze 
was used; in the second, the same ele- 
ments of the ¥ were employed except 
that a 15 cm. gap separated the goal 
arms from the starting alley. The 
gap was sufhciently large so that a 
jumping response was required to 
cross it. While the usual alternation 
behavior was observed in the first 
condition, almost complete repetition 
was evidenced in the second. 

The above that too 
much work would lead to a decrease 
in alternation would apply here if it is 
further assumed that the rats’ jump- 
ing was characterized by a strong po- 
sition bias. The fact that Ss were 
rewarded should increase any such 
initial repetition tendency. While 
this explanation seems plausible, 
replication and further - investiga- 
tion of Jackson’s results might best 
precede continued speculation. 


lerences 


suggestion 


For the sake of completeiiéss-we 
shall note one final type of work-re- 
lated experiment. Some investigators 
(12, 46, 47) have attempted to ma- 
nipulate work by varying the number 
of forced trials to one alternative. 
The general result is the expected di- 


‘failed to find any effect. 
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rect relation between amount of al- 
ternation and number of forced trials. 
While these data have been consid- 
ered to support the 7 interpretation, 
they obviously follow equally well 
from the satiation or novelty points 
of view. Thus, in general, investiga- 
tions of the work variable are either 
not theoretically crucial, or show no 
effect of work on alternation, or even 
suggest a relation which is contrary to 
the Je prediction. 

Pre-exposure. If, as is specified in 
the sJ interpretation of alternation, 
perception of a stimulus reduces the 
organism's tendency to respond to 
that stimulus, then exposure to a 
stimulus prior to choice should lead to 
the avoidance of that stimulus if it is 
soon afterward 
choice point. 

Experiments by Glanzer (15) and 
Sutherland (35) have demonstrated 
such an effect of pre-exposure. In the 
Glanzer experiment the Ss_ were 
placed in one goal arm of a T maze, 
detained there for one minute and 
then given a choice trial; the Ss were 
either returned to the starting stem 
by the E as is ty pically done, or were 
allowed to return there on their own. 
In both conditions there was a signifi- 
cant tendency for the animals to 
enter the arm to which they had not 
been pre-exposed. In Sutherland's 
experiment, rats were placed and fed 
in one goal box, and then were given 
a choice trial. They, too, showed a 
preference for the alternate arm. 

However, with procedures similar 
to those above, Walker, et al. (40) 


encountered at a 


In one ex- 
periment rats were exposed to black 
or white stimulation in small boxes, 
and then introduced into a T maze 
with a black and a white goal arm. 
No effect of the prior exposure was 
evident in the animals’ choices. In 
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another experiment the pre-exposure 
took place in one of the goal boxes it- 
self. Again, there was no apparent 
effect on the rats’ subsequent choice 
of goal arms. 

It is difficult to account for the dis- 
crepancy between the results of the 
studies cited above. One difference in 
the procedures followed by Walker, 
et al. and Glanzer may be relevant. 
Glanzer’s rats were confined in the 
goal arm and apparently were free to 
wander within it as far as the choice 
point; in contrast, the Ss of the 
Walker, et al. experiment were con- 
fined in the goal box and could not re- 
turn to the choice-point area. Per- 
tinent to this procedural difference is 
the finding of Kivy, Earl, and Walker 
(20) that exposure at the choice point 
influence an animal's 
quent periormance.* 

In the Kivy, et al. experiment rats 
were allowed to explore the choice- 
point region of a T maze; they could 
see into the two goal arms, but were 
from entering them by 


does subse- 


prevented 
means of glass doors. During the ex- 
posure period, both arms were similar 
in brightness, e.g., black; prior to a 
subsequent choice trial one of the 
arms was changed in brightness, e.g., 
to white. The rats tended to enter the 


arm which had been changed in 
brightness. 

This experiment, incidently, makes 
of sIr, Hull's conditioned inhibition, 
(17) an unlikely explanation of al- 
ternation, something which the Mont- 
gomery (25) and Glanzer (13) + 
maze studies do not accomplish. For, 
if the Rof s/ecanrefer toan approach 
response, conditioned inhibition re- 
mains a possible Hullian explanation 


of the -— maze results, even though 
3 Unfortunately for this argument, Suther- 


land's rats were also confined in a goal-box 
away from the choice point. 
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Ir is not applicable. But in the Kivy, 
Earl, and Walker study, any ap- 
proach response was minimal since 
the goal arms were blocked at the 
choice point by glass doors. 

Both Jr and s/pr as well as sl are 
considered unsatisfactory explana- 
tions of the results of an experiment 
by Dember (2). In this study the pro- 
cedure was similar to that followed 
by Kivy, et al., except for one im- 
portant difference. On the exposure 
trial, one arm was black and the 
other white; on the choice-trial the 
arms were either both black or both 
white. The Ss were faced with a 
choice between two equally ‘“‘sati- 
ated” or “‘inhibiting’’ stimuli. Their 
responses should, therefore, have 
been randomly distributed, but they 
were not: the Ss entered the arm 
which had been changed from the 
exposure-trial condition. 

Dember’s explanation of this be- 
havior, and of alternation in general, 
makes use of the concept of environ- 
mental change, which is one source of 
novelty. This idea is further elab- 
orated by Dember and Earl (3). 

As far as alternation behavior is 
concerned, there is little to distin- 
guish the Dember and Earl explana- 
tion from: the one proposed by Mont- 
gomery (23, 24, 25). It is likely, how- 
ever, that differences will emerge as 
the general theories from which these 
explanations are derived become 
further articulated. 

Similarity of the alternatives. In 
accordance with s/ deduction (g) 
the more similar the alternatives, the 
less should be the amount of alterna- 
tion.* 

Two direct tests of this hypothesis 
have been attempted. In the first 

‘ It is interesting that Saltz (32) derives the 


opposite prediction from a theory which bears 
indirectly on alternation. 
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study® alternation behavior under 
three similarity conditions was ob- 
served: rats were run in a T maze 
with (a) one black and one white 
arm, (6) one arm black and the other 
light grey, and (c) one black, the 
other dark grey. No differences in 
amount of alternation were found. 

The second experiment relates to 
the similarity issue through Glanzer's 
further deduction that ‘‘the greater 
the extent of S’s sense-organ damage, 
the less the amount of spontaneous 
alternation” (16, p. 263). To test 
this hypothesis, the alternation be- 
havior of a group of blinded rats was 
compared with that of a normal 
group; again no differences were 
found (5). 

These failures to confirm predic- 
tions based on the similarity issue, as 
pointed out by Dember and Roberts 
(5), are most likely attributable to 
the rats’ using dimensions other than 
vision as sources of alternation when 
visual differences are minimal or lack- 
ing. This suggests that any test of 
the similarity hypothesis must em- 
ploy methods less direct than the ones 
used above. In particular, dimen- 
sions other than the one manipulated 
must be rendered neutral with re- 
spect to the two alternatives. Sup- 
porting evidence for the similarity 
hypothesis has been obtained 
through such indirect tests. 

One test has been reported by 
Dember and Millbrook (4). In this 
experiment rats were first exposed, in 
the choice-point region, to the two 
arms of a Y maze. On the exposure 
trial one arm was grey and the other 
either black or white. On the choice 
trial, the two arms were equal in 
brightness, and -both different in 


5 FE. L. Walker, unpublished manuscript, 
University of Michigan. 
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brightness from the values used on 
the exposure trial. This means that 
the rat was faced with a choice be- 
tween two changes, one greater than 
the other, as, for example, a black-to- 
white change versus a grey-to-white 
change. The Ss’ behavior confirmed 
the deduction that the larger change, 
or the more dissimilar alternative, 
would be preferred.® 

Another indirect 
the similarity prediction is contained 
in the previously mentioned experi- 
ment of Walker, et al. (41), where the 
two responses were made more dis- 
criminable than usual. This was ac- 
complished by means of a specially 
designed -—* maze which required a 
three dimensional 


confirmation of 


response: to go 


right, the rat had to turn, twist, and 
climb right, and to go left, do just the 
opposite. 


Under these conditions, a 
significant, though still small amount 
of response alternation was obtained, 
whereas, in an otherwise comparable 
flat maze response alternation was at 
a chance level. 

Also compatible with the similar- 
ity hypothesis are the results of an 
experiment by Zeaman and Angell 
(46). In this study, rats were forced 
to one arm of a four-arm radial maze. 
The forced trial was always to one ol 
two arms placed at a 90° angle from 
the starting stem. The other three 
arms were separated from the forced 
arm by 60°, 120°, and 180°. After 10 
forced, reinforced trials to the T arm, 
the rat was given a free trial with all 
four arms available. Frequency of 
choice was inversely related to angle 
of separation. If separation angle is 
thought of as contributing to similar- 


6 Dember and Millbrook suggest that this 
preference for the greater change may be 
useful as an indicator response for purposes of 
stimulus scaling. 
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ity, these results provide additional 
support for the similarity hypothesis. 

An instructive failure to find a rela- 
tion between amount of alternation 
and separation angle is provided in 
an experiment by Jackson (19). Rats 
were run in a maze with two arms 
separated by either 15°, 90°, or 180°. 
No differences were found in the 
amount of alternation produced by 
each condition. The same interpreta- 
tion apparently applies here as was 
offered of the two direct tests de- 
scribed above. Given an opportunity 
to alternate, rats will do so as long as 
the alternatives are discriminable on 
some dimension. Decreasing the dis- 
criminability of the alternatives on 
one dimension, e.g., spatial separa- 
tion, will not affect amount of alter- 
nation per se. The effect of these 
manipulations can be observed, how- 
ever, in indirect tests, in which, for 
example, the S is allowed to choose 
among alternatives which vary in 


their similarity to the exposure stim- 
uli. 

Intertrial interval. 
alternation must postulate some trace 
on Trial 2 of the events of Trial 1. 


Any theory of 


The duration and strength of this 
trace are revealed by the relation be- 
tween percentage of alternation and 
length of intertrial interval. Data 
of this sort are available, but quite 
varied. 

For example, Montgomery (23) 
failed to obtain alternation for inter- 
vals greater than about a minute. 
Dennis (7) and Heathers (16) report 
significant alternation only for inter- 
vals up to about 2 min. Riley and 
Shapiro (30) found alternation with a 
25 sec. interval, but not with one of 
5 min. 

Walker (38), however, has 
tained significant amounts of alterna- 


ob- 
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tion for much longer intertrial inter- 
vals. He related percentage of alter- 
nation to intertrial interval for inter- 
vals ranging from about one to 300 
min. Amount of alternation varied 
around 75° for intervals up to 60 
min. and then abruptly dropped to 
the chance level at about 90 min. 
This relationship between alterna- 
tion and intertrial interval is not the 
decreasing negatively accelerated 
function expected ‘either from the 
Hullian assumption about the dissi- 
pation of Zp or from Glanzer’s an- 
alogous assumption relative to satia- 
tion. Heathers (16), however, did 
find an inverse relationship over very 
short intervals ranging from 15 sec. to 
2 min. 

There are plausible bases for ex- 
plaining the wide discrepancies 
among these data. First, theoretical 
import has been ascribed to the de- 
gree of similarity between the alter- 
natives. While it is true that direct 
tests have failed to reveal an effect of 
similarity on amount of alternation, 
these tests involved relatively short 
intertrial intervals. It seems likely 
that a more difficult test, i.e., one em- 
ploying a long intertrial interval, 
would show an effect of similarity. 
Thus, experiments which differ with 
respect to the similarity variable 
might also be expected to yield differ- 
ent results pertaining to intertrial in- 
terval. To illustrate, Walker (38), 
employing a T maze with highly dif- 
ferentiated arms, found alternation 
with intervals up to at least an hour; 
Montgomery (23), who used highly 
similar alternatives, found no alter- 
nation with intervals beyond a min- 
ute. 

The second factor to be considered 
in explaining the variety of intertrial 
interval results is the procedure 
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whereby the Ss experience the alter- 
natives on Trial 1. The experiments 
above all used a free-choice procedure: 
both alternatives were available to 
the rat on Trial 1. With this, in con- 
trast to a forced-choice procedure, the 
rat’s typical vacillation at the choice 
point permits a partial experiencing 
of both alternatives; this experience 
may have the effect of reducing the 
alternation tendency on Trial 2. 

Furthermore, the free-trial pro- 
cedure invites the effect of bias, as 
discussed in the Section herein on 
Definition and Measurement. This 
would also lead to an apparent de- 
crease in the amount of alternation if 
the bias effect were ignored in the 
treatment of the data. 

With a forced-choice procedure 
only one alternative is available to 
the rat on Trial 1, thus decreasing 
both of the possible confounding ef- 
fects mentioned above. A forced- 
choice procedure, therefore, should 
generally yield higher levels of alter- 
nation than a free-choice procedure. 

Unfortunately, there is no single 
experiment in which intertrial inter- 
val has been varied following one 
forced trial. Nevertheless, it is possi- 
ble to find cross-study data which 
support our hypothesis. Zeaman and 
House (47) obtained 61% alternation 
with a 60-min. intertrial interval be- 
tween forced Trial 1 and free Trial 2. 
With the same procedure, but a 30- 
min. interval, Rothkopf and Zeaman 
(31) found alternation to be about 
71%. Hence, significant alternation 
was obtained with the forced-trial 
procedure at intervals far exceeding 
all but those of Walker’s (38) free- 
trial experiment. 

In evaluating this set of data, it 
would seem reasonable to assume 
that positive results, i.e., alternation 
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after long intervals, reveal the basic 
relationship, and that negative re- 
sults, i.e., no alternation after very 
brief intervals, are confounded by 
factors such as those we have cited. A 
serious experimental attack on this 
problem, in which degree of similarity 
of alternatives, Trial 1 procedure, 
and intertrial interval are varied, 
would be extremely valuable at this 
point. 

Other experiments on the duration 
of alternation tendency might profit- 
ably be mentioned here. Under the 
condition of a free trial following 10 
forced trials to the same arm of a T 
maze, Zeaman and House (47) were 
able to find alternation for intervals 
up to at least 12 hours. In studies 
where rats are trained to alternate, 
i.e., Where reward on any trial is con- 
tingent on the S's alternating, Petrin- 
ovich and Bolles (29) and Ladieu (21) 
found that some rats could alternate 
after as much as a five hour interval. 
That trained alternation is differently 
motivated from spontaneous alterna- 
tion is quite likely; nevertheless, both 
types presumably require similar 
trace mechanisms. 

The postulation of a trace which 
decays over time suggests the possi- 
bility of using alternation as an indi- 
cator response for the study of short- 
term memory. This idea was pro- 
posed as early as 1939 by Dennis (7), 
but so far has received little atten- 
tion. It has been followed in at least 
one experiment, however, which 
nicely illustrates such a use of alter- 
nation. Morgan and Wood (26) 
measured amount of alternation be- 
fore and after making lesions in vari- 
ous cortical areas. They found a 
marked decrease in alternation scores 
following either frontal or occipital 
lesions. 
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Intratrial interval or exposure time. 
In stimulus-oriented theories, the 
length of time during which the S is 
exposed to a stimulus object deter- 
mines the amount of 
novelty-reduction produced, and 
hence, the amount of alternation. 
Glanzer (13) reports the only experi- 
ment directed at this assumption. 
Rats were detained for 10 minutes 
after their Trial 1 response in either 
of three places: the end box, the start 


satiation or 


box, or the choice-point region of a-° 


T maze. As predicted, the group de- 
tained in the end box showed the 
greatest amount of alternation, 98%, 
which was even higher than that of a 
no-delay group. 

Distributions of successive free trials. 
An interesting problem is posed when 
several free trials are given in succes- 
sion at a fixed intertrial interval. 


Under these conditions it has gener- 
ally been found that amount of alter- 
nation decreases from the first pair of 


trials to later pairs. Sutherland (35) 
gave his rats 11 trials per day with an 
intertrial interval of less than one 
minute. He obtained 81% alterna- 
tion between Trials 1 and 2, but an 
average over all 10 pairs of only 65%. 
In another condition yielding a gen- 
erally lower level of alternation the 
respective values were 65% and 49°%. 

Similar results were obtained ear- 
lier by Wingfield and Dennis (45). 
ver six massed trials, the amount of 
alternation between successive pairs 
decreased linearly from about 80% to 
50°%. In an experiment by Weitz and 
Wakeman (43), 30 pairs of trials were 
given; in successive blocks of 10 pairs 
of trials percentage of alternation was 
found to be 74, 60, and 53 for one 
condition, and 69, 48, and 48 for an- 
other. This same decline of alterna- 
tion behavior over trials was clearly 
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obtained by Riley and Shapiro (30), 
but somewhat less clearly by Heath- 
ers (16) and Glanzer (13). 

Though the data generally indicate 
a decrease in alternation over trials, 
irregularities and reversals in this 
trend are often present. A more de- 
finitive approach to this matter 
might be achieved through control of 
the S’s pattern of choices over pairs 
of trials, perhaps by the use of the 
forced-trial procedure. 

One explanation of the decrease in 
alternation over trials appears obvi- 
ous. On Trial 2 S is influenced only 
by the trace of the Trial 1 events, 
whereas on Trial 3 both the Trial 1 
and Trial 2 traces may be active. If S 
has alternated between trials one and 
two, either alternative on Trial 3 will 
at least partially satisfy the alterna- 
tion tendency. Glanzer (14) and 
Sutherland (35) have offered similar 
explanations. 

A simple test of this explanation 
also seems obvious. The decrease in 
alternation over trials 
could be minimized under conditions 
of optimal trial spacing. This optimal 
condition would be realized if the 
interval between trials were short 
enough to yield a high percentage of 
alternation between Trials 1 and 2, 
but long enough so that by Trial 3 
the trace from Trial 1 had dissipated. 
By inference from the intertrial data, 
an interval somewhere between 15 
and 30 min. should meet these re- 
quirements. 

Spatially successive alternatives. A 
response-oriented theory of alterna- 
tion predicts the occurrence of alter- 
nation not only in the two-alterna- 
tive, two-trial situation, but also in 
situations providing spatially succes- 
sive choice points, as for example in 
a multiple T maze. This same predic- 


successive 
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tion would be made by a stimulus- 
oriented theory which takes cog- 
nizance of response-feedback as a 
potential source of alternation; re- 
sponse alternation would not be ex- 
pected, however, as long as external 
stimulus differences are sufficiently 
salient. 

The only clear evidence for succes- 
sive response alternation comes from 
an experiment by Dennis and Henne- 
man (8), in which a multiple T maze, 
composed of highly similar elements, 
was used. In a subsequent study em- 
ploying a double square maze, Den- 
nis (7) found that the response at the 
first choice point had no effect on 
that at the second choice point. But, 
on the rats’ second run in the maze 
the response previously made at each 
choice point was alternated 

It may be that the lack of succes- 
sive response alternation in the Den- 
nis (7) experiment is attributable to 
the shape of the maze units. Before 


arriving at the second choice point, a 
rat would have made not one, but 


rather, four prior turns—two rights 
and two lefts. This should mask any 
response alternation tendency. 

In a recent set of experiments, 
and Schoefiler (12) used a 
forced-trial procedure with several 
variations of a multiple T maze. 
They, like Dennis, found that the 
“alternation tendency produced by a 
forced turn is almost entirely specific 
to the point in the maze at which the 
forced turn occurs” (12, p. 359). Of 
course, Estes and Schoeffler, unlike 
Dennis and Henneman, were pitting 
response alternation against stimulus 
alternation, and the predominance of 
the latter is not surprising. 


Estes 


ALTERNATION BEHAVIOR AND 
LEARNING THEORY 


Alternation behavior, as a psycho- 
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logical issue, makes closest contact 
with learning theory in those cases 
where alternation occurs despite rein- 
forcement of the first trial response. 
As Estes and Schoeffler state it: “‘Ac- 
cording to any extant version of S-R 
theory, the reinforcement of runs to 
a given side of a T maze should in- 
crease the tendency to go to that 
<i (12, p- 357). Cleariy this 
is not the case. 

The failure of reward to increase 
the probability of the occurrence of a 
response in the two-alternative situa- 
tion is cogently demonstrated by 
studies in which a number of forced, 
reinforced trials are given to one arm 
of a T maze. Even with 10 such 
trials, alternation on a subsequent 
free-choice trial is very markedly 
present (47). 

In fact, alternation tendency ap- 
pears to be strengthened as the num- 
ber of forced, reinforced trials to one 
alternative is increased. Both Roth- 
kopf and Zeaman (31) and Zeaman 
and Angell (46) found more alterna- 
tion after 10 forced trials than after 
one and two, respectively. In the 
study by Zeaman and House (47) the 
number of forced, reinforced trials to 
one side of a T maze was varied over 
a range from one to ten. Their data 
indicate a direct relation between 
alternation tendency and the number 
of reinforced trials to the forced al- 
ternative. 

An interesting variation of the 
forced-trial technique has been em- 
ployed by Denny (10). Rats were 
given two trials a day for 24 days ina 
T maze, with forced trials introduced 
in such a way that one arm was en- 
tered twice as often as the other. All 
trials were reinforced. During this 
training period there was an increas- 
ing tendency for the Ss to choose the 
frequently entered (and re- 
warded) arm when allowed a free- 


side... 


less 
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trial. On the day following the train- 
ing period, two free trials were given, 
and again a significant preference for 
the less frequently entered arm was 
evidenced. The same was still true 
when two free trials were given one 
week later. 

A striking aspect of these data 
concerns the Ss’ behavior on the sec- 
ond free-trial: with respect to the 
first free-trial response, the second 
was a repetition, not an alternation. 
With respect to the entire sequence 
of previous responses, of course, the 
second free-trial response, as well as 
the first, was to the less frequently 
visited arm. In this sense, these data 
fit better with a novelty than a tem- 
porary satiation explanation of al- 
ternation. 

The most direct attempt at han- 
dling this reinforcement ‘“‘paradox”’ 
is Walker's (38) system, which was 
considered in the section herein on 
General Explanations of Alternation. 


In it is presented the view that re- 


ward can be conceived of as an ‘“‘em- 
phasizer.”’ More specifically: “‘Every 
reaction produces a reaction decre- 
ment—a temporary lowering of the 
probability of the elicitation of the 
reaction. Reward contiguous with 
the reaction produces a reinforcement 
of the reaction. The more reinforce- 
ment there is, the more reaction de- 
crement there will be. The decre- 
ment is followed in time by an incre- 
ment. The more reinforcement there 
is, the more rapid the recovery from 
the decrement and the greater the 
eventual increment (learning)’’ (38, 
p. 167, italics omitted). 

To test this idea, Walker compared 
the alternation tendencies of rein- 
forced and nonreinforced rats over a 
lengthy range of intertrial intervals. 
Some evidence was obtained to sup- 
port the hypothesis that reinforce- 
ment enhances alternation behavior 
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during the course of the reaction de- 
crement. 

The results of a recent unpublished 
study by Fowler and Fowler,’ how- 
ever, are opposed to Walker's hy- 
pothesis. Rats were given six massed 
free trials in a T maze. Reinforce- 
ment consisted in the complete reduc- 
tion in both goal arms of various elec- 
tric shock intensities present in the 
starting stem. The data revealed an 
inverse relation between alternation 
tendency and magnitude of shock re- 
duction. Moreover, the highly re- 
warded Ss showed not a single occur- 
rence of alternation. 

Related to this finding are the re- 
sults of a study by DeValois (11) in 
which different kinds and degrees of 
motivation were related to variabil- 
ity of behavior. Employing two 
levels of thirst and of shock motiva- 
tion, DeValois found amount of vari- 
ability in a_ spatially successive, 
multichoice situation to be an in- 
verse function of amount of motiva- 
tion. These data, and more directly 
those of Fowler and Fowler, appear 
to indicate that alternation behavior 
is decreased, or eliminated, under 
conditions of high motivation and/or 
large magnitudes of reinforcement. 
It also is conceivable that escape or 
avoidance behavior generally shows 
less alternation than approach behav- 
ior, independently to strength of mo- 
tivation and amount of reinforce- 
ment. 

At any rate, as far as moderate 
drive and reinforcement are con- 
cerned, the problem posed by Estes 
and Schoeffler persists. For example, 
under the unequal reinforcement con- 
ditions of the Denny (10) study, the 
Ss tended to choose the less fre- 
quently rewarded arm. Even where 

v 
7 Fowler, H. and Fowler, D. E., unpublished 


manuscript, Yale University. 
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only one approach response is rein- 
forced, as in the selective learning 
situation, alternation is usually prev- 
alent in the early training trails. 
From the point of view of the learn- 
ing theorist, then, alternation behav- 
ior presents a serious problem. An 
equally difficult problem exists for 
the alternation theorist. ‘“‘Why rein- 
forcement disrupts the alternation 
pattern” is, for the latter, as crucial 
an issue as ‘“‘why reinforced responses 
are alternated” is for the former. The 
solution of either version of the prob- 
lem should contribute to the enrich- 
ment of behavior theory in general. 


SUMMARY AND CONCLUSIONS 


The spontaneous alternation be- 
havior exhibited by rats faced with 
repeated choices of alternatives can 
no longer be explained adequately via 
Hull's concept of reactive inhibition. 
The source of alternation may some- 
times be in the Ss’ responses, but 


more generally alternation is a reac- 
tion to stimuli, of which response 
feedback is but one minor compon- 
ent. To replace the reactive inhibi- 
tion explanation of alternation an 
analogous concept, stimulus satia- 
tion, has been offered by Glanzer. A 
theory built around this concept has 
received empirical support from a 
variety of experiments. Some data, 
however, seem to require a more gen- 
eral theoretical explanation, and mo- 
tivational concepts, such as curiosity 
or response to novelty, have been 
suggested. 

Certain empirical generalizations 
about the variables which affect 
alternation seem warranted: (a) If 
the alternatives require equal 
amounts of work, then variation in 
this amount of work does not affect 
alternation. (6) Variation in the 
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similarity of the alternatives on a 
single dimension does not influence 
amount of alternation, at least for 
short intervals; Ss, however, will 
choose from among alternatives the 
one which is the least similar to a 
previously experienced alternative. 
(c) Alternation may occur, following 
a single response, after an intertrial 
interval of as much as 90 min.; as 
number of forced responses to a single 
alternative increases, amount of al- 
ternation on a subsequent free-trial 
increases, reaching a maximum after 
10 forced trials. With 10 forced trials 
alternation may occur after an inter- 
trial interval as great as 12 hours. 
(d) Increasing the time that S is in 
contact with an alternative increases 
the amount of alternation. (e) As 
number of massed free trials in- 
creases, amount of alternation de- 
creases. (f) If S is presented with a 
set of spatially choice 
points, the amount of alternation 
from one to the next is small, if above 
chance, and probably represents the 
relatively weak influence of response 
feedback as a source of alternation. 
(g) Under moderate drive strength 
and with moderate amounts of rein- 
forcement, equal reinforcement in 
both alternatives does not decrease 
amount of alternation, as compared 
with a nonreinforcement condition. 
Even unequal reinforcement between 
the alternatives yields a high per- 
centage of alternation. (hk) Under 
conditions of strong drive and/or 
large magnitudes of reinforcement, 
potentially equal reinforcement be- 
tween alternatives yields a low per- 
centage of alternation. 

The evidence which bears on the 
above generalizations has been ex- 
amined, and in several places sugges- 
tions for further research have been 


successive 
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made. Attention was called to the reinforcement relation poses for the 
equally difficult and equally impor- learning theorist on the one hand and 
tant problems which the alternation- the alternation theorist on the other, 
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In a recent paper in this journal, 
Pearson and Kley (11) complain of 


the proclivity of psychologists for assuming or 
demonstrating variables to be distributed in 
continuous fashion throughout the general 
population, but concentrating attention on the 
pathological extremes, which may, in fact, 
constitute discrete series. The familiar con- 
cept of a normal distribution for ‘emotional 
adjustment” ranging from “‘super-normal” 
and “‘normal” to “neurotic” and “psychotic” 
has led laymen and behavioral scientists alike 
to picture human emotions in various shades 
of gray.... While such conceptualizations 
may serve a useful purpose, they may also be 
misleading. The danger lies in the temptation 
to infer continuous distribution of underlying 
etiological factors from the fact that be- 
havioral traits appear to be so distributed. 


Pearson and Kley then quote a 
study by Eysenck and Prell (8) as an 
apparent example of this fallacy. 
They say of these authors that: 


their assumption that neuroticism is on a con- 
tinuum in the general population and the 
samples employed make (sic) it impossible 
to infer that clinical cases of neurosis arise at 
the extreme end of the continuum only be- 
cause of the degree to which they inherit the 
neuroticism factor. Testwise or symptomwise, 
the diagnosed neurotics do constitute the ex- 
treme of the distribution, but the reason fof 
their coming to this sorry end may be quite 
different from the reasons which cause indi- 
viduals in the “borderline” or “normal” 
range of tests scores or clinical behavior to 
fall where they do. 


The point that the distribution of 
scores on a single test cannot be safe- 
ly interpreted to give a correct indi- 
cation of the distribution of the un- 
derlying determinants in the absence 
of a proper metric and in view of the 
usual large error variance is well 
taken. It was made explicitly by the 


writer in The Structure of Human 
Personality (5, p. 11), and having al- 
ways argued against the tendency of 
psychiatrists and psychologists to as- 
sume either continuity or discontin- 
uity of normal and abnormal be- 
haviour in the absence of proof, the 
writer was not unnaturally surprised 
to find himself- accused of this very 
crime. It is the purpose of this brief 
note to show that this very funda- 
mental criticism of the genetic and 
other experimental work done in the 
field of abnormality by the writer and 
his colleagues is not in fact subject to 
this charge. 

We have already agreed with 
Pearson and Kley that no faith can 
be put in the distribution of scores on 
any one test in arguing for or against 
the continuity hypothesis. The writ- 
er accordingly put forward a method 
for investigating this problem along 
hypothetico-deductive lines which 
was published under the name of 
“Criterion Analysis” (2). Having 
outlined his theory of the existence of 
the general factor of neuroticism, 
similar in mode of derivation and 
general interpretation on the orectic 
side to the general factor of intelli- 
gence on the cognitive side, the writer 
went on to say that what was at issue 
in this paper was “the hypothesis 
that this putative factor of ‘neu- 
roticism’ forms a quantitative con- 
tinuum, on one extreme of which are 
to be found hospitalized neurotics, 
while so-called normals are to be 
found all the way from the near neu- 
rotic and neurotic to the conspicu- 
ously non-neurotic, mature, stable, 
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and integrated type of personality” 
(2, p. 42). It was also pointed out 
that a similar problem arose in con- 
nection with Kretschmer’s hypothe- 
sis of the existence of a normality- 
abnormality continuum ranging from 
the normal to the psychotic. Having 
thus indicated the problem, which of 
course also includes the identity or 
independence from each other of 
these two hypothetical continua, the 
writer went on to outline the method 
of criterion analysis, whose specific 
merit was claimed to be its ability to 
provide evidence relevant to this 
type of hypothesis. Empirical data 
have been given to demonstrate the 
truth of the continuity hypothesis, 
particularly with respect to neuroti- 
cism (2) and psychoticism (3). Later 
studies using the technique of canon- 
ical variate analysis (6, 9) have dem- 
onstrated the essential independence 
of these two continua. A detailed 
discussion of all this work is given in 


The Dynamics of Anxiety and Hys- 
teria (7). 

It would be open to Pearson and 
Kley to criticise this method along 


various lines. They might argue 
against the logic underlying criterion 
analysis which postulates that if, and 
only if, there exists a continuum be- 
tween normal and abnormal mental 
states will there be found (a) corre- 
sponding factors in correlation ma- 
trices derived separately from tests 
administered to normal and abnor- 
mal groups and (6) significant corre- 
lations between these factor loadings 
and what the writer has called the 
“criterion column,” i.e., the column 
of biserial correlations between nor- 
mality-abnormality on the one hand, 
and the various tests used in the ex- 
periment on the other. Such at- 
tempts as have been made in the lit- 
erature to impugn the logical validity 
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of criterion analysis (e.g., 1) appear 
to have rested on a misunderstanding 
of the method (4) and cannot be re- 
garded as fatal to the postulation. 
Secondly, it might be open to 
Pearson and Kley to argue that the 
specifications of the method are not 
sufficiently clearly expressed to make 
its use feasible. Thus a large and 
varied battery of tests is required, all 
of which must discriminate signifi- 
cantly between the normal and the 
abnormal group. It might be argued 
that “large and varied”’ is too indefi- 
nite a description, and that no par- 
ticular standard of significance has 
been specified. These objections 
would be well taken, and it is to be 
hoped that in due course it will prove 
possible to give a more operational 
definition of ‘‘varied”’ than is possible 
at present. The writer doubts, how- 
ever, if these difficulties are fatal to 
the method. Until better criteria of 
selection are available, it might be 
suggested that the number of tests 
should not be below 20, with the 
standard of significance to be taken 
as the p=.01 level, and that the 
term ‘“‘varied”’ should be interpreted 
as referring to the abilities involved 
in the tests used, as determined by 
factorial analysis; different muscle 
groups used in the execution of the 
tasks; different sense organs used in 
the mediation of the tasks; and so 
forth. It is doubtful whether in prac- 
tice much doubt will arise on any of 
these points. 
Lastly, it will be open to the critic 
to point to certain weaknesses in the 
mathematical treatment of criterion 
analysis, as was done for instance by 
Lubin (10). These difficulties are 
very real, and the writer has no wish 
to gloss over them. Until they are 
completely overcome it is obviously 
necessary to use the method with 
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considerable circumspection, and 
only with the fullest understanding of 
the assumptions underlying each 
step taken. Nor should the interpre- 
tation of the final results be made in 
any but the most tentative fashion. 
Nevertheless, and in spite of all these 
qualifications, the writer does not 
know of any other method available 
at present which tackles this particu- 
lar problem, or which can offer us 
worthwhile information relating to it. 
Until a better method is available, 
therefore, criterion analysis will re- 
main as a worthwhile addition to our 
methodological set of tools. 

It is noteworthy, however, that 
Pearson and Kley do not criticise 
criterion analysis on any of these 
grounds. What they do instead is to 
neglect the whole body of work done 
by the writer in connection with this 
method and this problem and to pre- 
sent him as basing his views entirely 
on an invalid argument from the 
simple distribution of single test 
This does not appear to the 
writer to be a reasonable form of criti- 
cism, and consequently it seems de- 
sirable to put the point in its proper 
perspective. 

In spite of all that has been said in 
this note, the writer would not wish 
to dismiss the possibility or even the 
likelihood that in any random group 
of clinically diagnosed neurotics there 
would be found a small number of 
people who might “constitute a group 
apart, different not in degree, but in 
kind, by reason of some specified bio- 
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chemical error, which is highly pre- 
dictable in terms of inheritance, and 
which operates in a manner quite dif- 
ferent from anything observed”’ in 
the kinship relations of the remainder 
of that group. The evidence quoted 
makes it somewhat unlikely that the 
major part of any given neurotic or 
psychotic group would be made up of 
such individuals, but no one familiar 
with the heterogeneity of psychiatric 
groups would wish to suggest serious- 
ly that all the members of such 
groups were homogeneous with re- 
spect to hereditary processes and 
genetic determinants. Nevertheless, 
as a first approximation, we must be 
concerned with the major sources of 
variance affecting the majority of 
members of such groups, and it is in 
this connection that the writer can- 
not follow the criticism levelled by 
Pearson and Kley against the Ey- 
senck and Prell study. 


SUMMARY 


Pearson and Kley (11) criticize the 
writer for basing his belief in the con- 
tinuity of normal and abnormal 
states on the invalid consideration 
that test scores tended to be con- 
tinuous between the groups. In an- 
swer, the writer has pointed out that 
he himself had discussed the lack of 
validity of this procedure in detail 
and had advocated a different meth- 
od, namely, that of criterion analysis, 
specifically designed by him to deal 
with problems of this kind. 
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Reference in our paper (3) to the 
study by Eysenck and Prell (1) was 
not intended to imply that they un- 
wittingly subscribed to the error of 
inferring continuous Cistribution of 
etiologic factors giving rise to mem- 
bership in either of the classes ‘“‘neu- 
rotic’’ or “normal” from the fact 
that, phenotypically, the behavior of 
both classes seems to be on a con- 
tinuum (2). Rather, our citation and 
discussion were intended to under- 
score the warning which Eysenck has 
duly set forth as to the heuristic na- 
ture of criterion analysis and the fact 
that this method and its application 
involve several arguable mathemat- 
ical and logical assumptions. Also, it 
should be noted that our emphasis on 
this warning applies not merely to 
questionable inferences drawn from 
the distribution of scores on a single 
test but drawn 


also to inferences 


from a distribution, however complex 
and polydimensional its conception. 


The statement still stands that 
E-vsenck and Prell’s criterion group of 
‘“neurotics’ may have contained 
some individuals so categorized by 
reason of qualitatively different deter- 
minants from those which deter- 
mined the class membership of other 
“‘neurotics’” and of the “normals.”’ 
There is simply no means inherent in 
the experiment they have described 
by which the truth or falsity of 
this statement may be determined. 
Knowledge of the relative numbers of 
clinically diagnosed neurotics with 
qualitatively different determinants 
as against those who differ from nor- 
mals only in the quantitative sense is 
of considerable importance to the ap- 


plication of correlational measures in 
any effort to reveal the nature of un- 
derlying genotypic or causal factors. 
While the evidence we cited may not 
be particularly convincing as to the 
probable existence among clinical 
neurotics of a significant number of 
cases determined by a specific gene or 
gene pair, nevertheless, in the ab- 
sence of any proof to the contrary, 
the possibility should be entertained 
as real. If this possibility is an actual- 
ity, a potentially serious inconsisten- 
cy is introduced repeatedly in criter- 
ion analysis in the imposition of the 
assumption of continuity upon data 
where this assumption is not justi- 
tied.. This would be particulariy true 
when computing biserial correla- 
tions between normality-abnormal- 
ity and various measures of behavior. 
Whether or not this inconsistency is 
so trivial as to be safely ignored can- 
not be determined unless the method 
is proved to be adequate in an experi- 
ment where discrete primary deter- 
minants of membership in the pheno- 
typic class, ‘“‘abnormal,” are firmly 
established on external grounds. The 
fact that opportunities for such ex- 
periments are rare in domains of hu- 
man behavior germane to psychomet- 
rics has made it difficult for factor 
analysts to establish the validity of 
their various methods in the identi- 
fication of causal factors. 

The availability of one such oppor- 
tunity was hinted in our paper and it 
was hoped that this might suggest a 
fruitful area of study to various work- 
ers. It may be well to amplify this 
suggestion briefly here. One might 
define operationally the phenotypic 
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classes ‘‘abnormal motor coordina- 
tion” and “normal motor coordina- 
tion”’ and select subjects whose mem- 
bership in the former could be 
ascribed primarily to real, discrete, 
causative factors such as the genes for 
Huntington’s chorea, Wilson's dis- 
ease, Friedreich’s cerebellar ataxia, 
familial spastic paralysis, and heredi- 
tary muscular dystrophy. To these 
might be added cases in which im- 
paired motor coordination was ob- 
served to ensue following exposure to 
specific agents of trauma, infection, 
or toxemia (e.g. gunshot wound in 
the parietal area, mumps encephali- 
tis, or lead encephalopathy). With 
samples sufficiently large in each 
category and a battery of motor tests 
as large and varied as the ingenuity of 
the experimenter could provide, one 
could subject a matrix of test inter- 
correlations to factor analytic pro- 
cedures in the hope that these could 
work backward from the valid dis- 
crimination of the broad phenotypic 
categories “normal motor coordina- 
tion” and ‘abnormal motor coordina- 
tion” to the valid discrimination of 
the separate genotypes identified 
previously on bases external to the 
experiment. For this to be possible, 
it would be necessary for the mathe- 
matical treatment of the test vari- 
ables to thread through a complex of 
intervening variables and hypotheti- 
cal constructs between the known 
membership in one or another of the 
discrete genotypic classes and the ob- 
served membership in either the 
“normal” or ‘‘abnormai’’ classes 
along the phenotypic continuum of 
“motor coordination.’’ Such compli- 
cating factors would include, for ex- 
ample, the biochemical action by 
which an individual gene mediates 
the reaction which we observe as 
Huntington’s chorea and the second- 
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ary determinants of ‘‘normal”’ versus 
“abnormal” motor coordination such 
as a reactive depression or an over- 
compensatory reaction to awareness 
of progressive organic disability. The 
biggest obstacle would probably be 
the existence of interaction effects 
between the identified genotypes and 
all other genotypic determinants of 
membership in all other classes sub- 
sumed under all other behavior do- 
mains. 

The reader may be impressed by 
the strictures which such an experi- 
ment would place upon factor ana- 
lytic methods. Admittedly, negative 
results (i.e., failure to emerge with a 
factor structure closely analogous to 
the predetermined genotypic com- 
position of the “abnormal” group) 
would not vitiate the heuristic appli- 
cation of similar procedures when 
there is no good reason to assume dis- 
crete distribution of causative fac- 
tors. On the other hand, positive re- 
sults in the experiment we have out- 
lined would be an indication that the 
imposition of assumptions of con- 
tinuity on data where these are 
known or suspected to be unjustified 
is not necessarily an inconsistency 
fatal to the method of criterion an- 
alysis. Experiments of this kind 
should be attempted by persons who 
are in a position to do them. At the 
same time, it would be well if some- 
one were to come to grips with the 
formidable task of formalizing math- 
ematical models on various discon- 
tinuity and interaction assumptions. 

Our original point was to under- 
score Eysenck’s own carefully stated 
reservations concerning the applica- 
tion of criterion analysis. We erred 
in not making this sufficiently clear. 
Here, we have tried to rectify this 
error and to amplify a suggestion 
made implicitly in the earlier paper, 
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for an experimental approach to the ures in relation to discrete genotypic 
validation of factor analytic proced- determinants of human behavior. 
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